Distributors ponder a systemd change [LWN.net]

Distributors ponder a systemd change

Posted Jun 7, 2016 23:20 UTC (Tue) by GhePeU (subscriber, #56133) [Link] (19 responses)

This is insane.

A change of this magnitude can't just be introduced like this, with no discussion at all and the expectation that users must suddenly launch commands differently, downstream developers must suddenly depend on systemd and system administrators everywhere must start fiddling with configuration files just to revert to the normal, useful behaviour of any Linux system pre-this idiocy after users start seeing things breaking (with negligible consequences like launching a process overnight before leaving and discovering in the morning that no, the results you were expecting aren't there because three different people didn't each do something to prevent the issue).

Insane, just insane, and now I'm really starting to believe that maybe the people who complained so much about previous "innovations" weren't so wrong.

Distributors ponder a systemd change

Posted Jun 7, 2016 23:31 UTC (Tue) by GhePeU (subscriber, #56133) [Link] (9 responses)

And I forgot, OK, maybe tmux and screen and nohup and who knows what else will one day be fixed, but what of the case when something takes longer than expected, or you just have to close the connection sooner than you thought? Will background-and-disown cease to work too?

Also, I'm not sold on this pretended difference between "servers" and "workstations" or "desktop systems." A Linux system is a Linux system is a Linux system, I remotely login to so-called "workstations" and "desktop systems" regularly, at work and at home, and I don't see why suddenly screen should stop working as intended on my desktop PC just because I'm running Fedora instead of Red Hat.

Distributors ponder a systemd change

Posted Jun 8, 2016 3:54 UTC (Wed) by pizza (subscriber, #46) [Link] (6 responses)

> Will background-and-disown cease to work too?

It was never guaranteed to work unless you took explicit steps to ensure as such when you launched the process. The fact that it sorta mostly did (except when it randomly didn't) was more luck than any sort of explicit design decision.

Distributors ponder a systemd change

Posted Jun 9, 2016 16:58 UTC (Thu) by ksandstr (guest, #60862) [Link] (5 responses)

>It was never guaranteed to work unless you took explicit steps to ensure as such when you launched the process. The fact that it sorta mostly did (except when it randomly didn't) was more luck than any sort of explicit design decision.

So how do you justify systemd's new default explicitly breaking what even you recognize, above, having worked before?

Distributors ponder a systemd change

Posted Jun 9, 2016 17:20 UTC (Thu) by pizza (subscriber, #46) [Link] (4 responses)

> So how do you justify systemd's new default explicitly breaking what even you recognize, above, having worked before?

FYI, on my personal systems I explicitly turned KillUserProcesses *on*, because I actually want that behavior. A couple of years ago I replaced my regular uses of nohup and screen with native systemd units or timers or whatever was appropriate, and haven't looked back since.

On the old-school shell server I administer, I've left that feature off, and I will do so until at least screen and tmux are shimmed to request proper login sessions. Once that's done, I'll flip the switch there too, and then I can finally get rid of my periodic process reapers that have to clean up after misbehaving crap.

This is a change, yes. But it's a change that, after a very minor amount of learning, leaves me with a more robust system that requires less ongoing attention than before. (Call me strange, but I believe in using the best tools for the task at hand)

Distributors ponder a systemd change

Posted Jun 12, 2016 13:31 UTC (Sun) by jspaleta (subscriber, #50639) [Link]

Exactly... starting to do this to... shutting down process linger as much as I possible can..and only enabling it when I'm sure I need it.
And I really love not having to run bolt-on process reapers.

I'm trying to keep my development system and even my workstation as locked down as my production environment now...and tracking the configuration differences..so I know exactly why I'm relaxing constraints on the dev system. I want to push against production constraints using non-production workloads in unexpected ways and see what breaks. The amount of relearning isn't that bad. I mean its not like relearning to jump to python3... this is minor.

Distributors ponder a systemd change

Posted Jun 19, 2016 3:53 UTC (Sun) by zblaxell (subscriber, #26385) [Link] (2 responses)

My first (and last) encounter with systemd three years ago revolved around this behavior. Some Yocto distribution or other had decided to turn this on by default, and it ruined my day.

Since then I've copied the behavior for myself, in the form of a half dozen five-line shell scripts that replicate systemd's cgroup behavior. Every aspect of the cgroups' lives--how much RAM, CPU, and IO they can use, and making sure processes run, live, and die when they're told to--can be handled this way. It's _awesome_, and it's definitely one of the better ideas coming out of the systemd project.

The other thing I realized was that sysvinit had been ruining my days for years, and systemd was going to continue that pattern. To isolate myself from upstreams that should know better, but make breaking behavior changes anyway, I replaced init with a shell script. It's a little longer than five lines--ranging from 55 to 155 lines of code depending on whether it's a desktop, embedded, or server workload--but I haven't looked back since.

It was a painful transition with a bit of learning curve, but it needs much less attention than before. Apparently the best tools for the task at hand were the Unix shell, the & operator, and some small syscall wrapper programs.

Distributors ponder a systemd change

Posted Jun 20, 2016 7:56 UTC (Mon) by zlynx (guest, #2285) [Link] (1 responses)

I'd have to double check but I am pretty sure the shells don't do init's job properly. Signal handing and child reaping, if I recall correctly.

You can get away with it for system rescue, but long term?

Distributors ponder a systemd change

Posted Jun 20, 2016 14:01 UTC (Mon) by zblaxell (subscriber, #26385) [Link]

To be clear, this was never intended to be a rescue system. We did a pilot project and the results were so successful (QA particularly enjoyed having a much more repeatable testing experience) that we promoted it to production and formally terminated plans to switch to anything else. We deploy everything this way now.

I'm not sure if _any_ shell works, but bash and dash do. Any shell that can trap signals (i.e. all of them) and that uses PID 0 as the argument to waitpid (all of them written after 1987) do this just fine. The kernel blocks most of the fatal signals anyway. If /bin/sh is segfaulting you have big problems and you should probably panic the kernel to stop them from getting worse.

Distributors ponder a systemd change

Posted Jun 8, 2016 12:56 UTC (Wed) by amarao (guest, #87073) [Link] (1 responses)

I see no such problem on any modern stable distribution. I think everyone would have their leisure time to read upgrade notes for their OSes and adopt coming changes. And this will happen in the future, not 'happens now'.

If you are siting on the bleeding edge your butt is bleeding. But you are on the edge.

Distributors ponder a systemd change

Posted Jun 10, 2016 9:56 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link]

> If you are siting on the bleeding edge your butt is bleeding. But you are on the edge.

Like this?

https://www.youtube.com/watch?v=6M17aG_Po2Y

Distributors ponder a systemd change

Posted Jun 8, 2016 3:28 UTC (Wed) by pizza (subscriber, #46) [Link] (6 responses)

> A change of this magnitude can't just be introduced like this, with no discussion at all and the expectation that users must suddenly launch commands differently [...]

If all of this is "no discussion at all" one wonders what you consider to be actual "discussion".

BTW, the only people for whom this change in defaults broke anything were those who deliberately opted to use fairly bleeding-edge, rolling-release distributions (eg Fedora Rawhide or Debian Unstable) that are, by their very nature, intended to flag potential issues due to [un-]anticipated changes in behavior.

I for one agree with our esteemed editor when he says "For all the fuss, one might well argue that the development community is working as it should."

Distributors ponder a systemd change

Posted Jun 8, 2016 7:41 UTC (Wed) by rschroev (subscriber, #4164) [Link] (5 responses)

> > A change of this magnitude can't just be introduced like this, with no discussion at all and the expectation that users must suddenly launch commands differently [...]

> If all of this is "no discussion at all" one wonders what you consider to be actual "discussion".

An actual discussion happens before the act, not after. In an actual discussion, one listens and considers arguments from other parties. None of that happened.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:43 UTC (Wed) by ovitters (guest, #27950) [Link] (3 responses)

It was discussed, you just didn't notice the discussion.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:54 UTC (Wed) by jubal (subscriber, #67202) [Link] (2 responses)

a.k.a. “it was discussed, but not with those who will be most affected”.

Distributors ponder a systemd change

Posted Jun 8, 2016 10:02 UTC (Wed) by pboddie (guest, #50784) [Link]

"But the plans were on display..."

Distributors ponder a systemd change

Posted Jun 12, 2016 13:14 UTC (Sun) by jspaleta (subscriber, #50639) [Link]

where did you expect it to be discussed and it wasn't?

There are a hell of a lot of communication channels... is everyone on the same page as to where they expect discussion to happen with regard to upstream changes for any project?

-jef

Distributors ponder a systemd change

Posted Jun 8, 2016 21:39 UTC (Wed) by HenrikH (subscriber, #31152) [Link]

But the act have not happened yet. Still v230 is only made available in unstable distributions made for testing out just these kinds of things. Just like our dear editor wrote in the article Linux is not Windows so what ends up on your machine is not always the exact same thing that upstream delivers.

Distributors ponder a systemd change

Posted Jun 8, 2016 12:17 UTC (Wed) by vadim (subscriber, #35271) [Link]

Tempest in a teapot.

It's up to distributions to set their policy and to configure the packages however they wish. If a distro blindly ships software without checking whether the changes in the new version will result something undesirable, then that's a problem with the release process.

That's the whole point of a distribution after all, and the reason besides convenience why we don't just tar -xvf ; make; make install everything.

Distributors ponder a systemd change

Posted Jun 8, 2016 15:11 UTC (Wed) by johannbg (guest, #65743) [Link]

Yes it can and what's insane is this whole concept and expectation of upstream marching downstream holding their application man pages up high and knocking on communities door so they can preach to the quire or justify it's own upstream community decision and changes, like some mormons is just moronic at best.

Upstream should always reflect how things should be ( from their own perspective ) while downstream reflects how things are or atleast how things are in relevance to them since these things can deviate by factor of how many different downstream sources there are.

It falls directly under downstream package maintainership responsibility to monitor upstream changes and if he, she or them deem some upstream changes warrants discussion in the downstream community, engage themselves with their own respectful community based on their own community guidelines, procedures and processes and discuss it. Then act accordingly to the conscious that has been reached in said community.

Distributors ponder a systemd change

Posted Jun 7, 2016 23:29 UTC (Tue) by acollins (guest, #94471) [Link] (33 responses)

I'm actually a big fan of systemd but this change was introduced in the completely wrong way.

Breaking an extremely common usecase and then telling everyone to fix their packages after the fact is not acceptable.

This should have followed the kernel process, where a change like this is clearly communicated months/years ahead of time and the community given time
to make the transition. It's very disappointing that they've ignored the lessons learned over the past decades on how things like this should be handled.

Distributors ponder a systemd change

Posted Jun 8, 2016 3:31 UTC (Wed) by pizza (subscriber, #46) [Link] (32 responses)

> Breaking an extremely common usecase and then telling everyone to fix their packages after the fact is not acceptable.

Assuming your distribution doesn't do it for you before you even get the update, the immediate solution is "flip the default back".

> This should have followed the kernel process, where a change like this is clearly communicated months/years ahead of time and the community given time
to make the transition.

This feature was in the very first public systemd release. How much longer should they have waited?

Distributors ponder a systemd change

Posted Jun 8, 2016 4:37 UTC (Wed) by drag (guest, #31333) [Link] (1 responses)

I definately agree.

Systemd releases are not intended for end users. They are low-level Linux plumbing software. The intended audience are distributions and system integrators. Nothing systemd is doing here is forcing anybody to do anything at all.

If distributions don't like it they just switch it back. It's their job as distributions to be aware of these sorts of things and make these sorts of decisions. If the distributions are so clueless to miss this sort of thing or they abdicate their rights to make their own choices and just accept code carte blanche from upstream... then what is the point of distributions at all? Users would be better off just cutting out the middle man and just pull directly from upstream in some sort of automated 'linux from scratch' approach to installing Linux.

Distributors ponder a systemd change

Posted Jun 9, 2016 6:17 UTC (Thu) by NightMonkey (subscriber, #23051) [Link]

"Users would be better off just cutting out the middle man and just pull directly from upstream in some sort of automated 'linux from scratch' approach to installing Linux."

God, I love Gentoo. :)

Distributors ponder a systemd change

Posted Jun 8, 2016 4:43 UTC (Wed) by error27 (subscriber, #8346) [Link] (4 responses)

Until screen and tmux were updated for a start.

Distributors ponder a systemd change

Posted Jun 8, 2016 5:50 UTC (Wed) by rengolin (guest, #48414) [Link] (3 responses)

Aren't we picking favourites here? How much longer should people wait for those to change?

Any stable distro can protect you from this change (I use Arch), so I don't see this as a big deal at all.

The only slight problem I see is that they should have worked directly with those more affected (ex. tmux and screen) *before* defining what the interface looks like, to avoid long lasting design problems. Though, this is again, chicken and egg, as defining who you talk to first is not easy.

All in all, noisy and a tad inefficient, but nothing out of the ordinary.

Distributors ponder a systemd change

Posted Jun 9, 2016 9:33 UTC (Thu) by Seegras (guest, #20463) [Link] (2 responses)

You don't "wait for other to change" -- you implement.

And this means, if you want to change the default there, to PROVIDE A FUCKING FIX for afflicted programs such as nohup, screen and tmux. Even if it's your forked version of it, and furthermore ANNOUNCE IN ADVANCE to your (biggest) downstreams when you're changing the default.

I can see the reasoning behind what systemd did. But I absolutely object the way they did and communicated it.

Distributors ponder a systemd change

Posted Jun 9, 2016 14:10 UTC (Thu) by johannbg (guest, #65743) [Link] (1 responses)

If component A exposes a bug in component B ( regardless if that bug was triggered by changes in that project defaults or simple encountered when used in conjunction with it ) then the community that makes up component B is responsible for fixing it ( bugs are fixed where they belong ) just like any other bug that get's filed yeah sure all patches welcome but there is absolutely no obligation from project A, it's community or developers or just reporters in general that discovered the bug and decided to report it, to fix it as well so you must be a very special individual if you really think or expect that an upstream that flips a switch in their own project defaults, run around the whole internet to fix every breakage and bugs in whatever corner of the software galaxy which are in no relation to their project o_O.

And with regards to announcements it's expected that a downstream package maintainer(s) are in good relation with upstream ( which all the major distribution Arch/CoreOS/Debian/Fedora/Gentoo/OpenSuse/Ubuntu etc are with systemd ) and follow upstream changes closely ( which they do ) to be prepared for changes like these ( which they where ) and start dialogs with downstream community should they feel necessary to do so ( which is far as I know all of them did, some before the official release of systemd 230 version, others after ).

So the question here is how come you missed that discussion in your community ( which usually indicates people that miss such discussions aren't active contributes in their communities or part of it at all as in just end users ) .

Distributors ponder a systemd change

Posted Jun 9, 2016 18:59 UTC (Thu) by flussence (guest, #85566) [Link]

>If component A exposes a bug in component B

If.

Distributors ponder a systemd change

Posted Jun 8, 2016 7:44 UTC (Wed) by rschroev (subscriber, #4164) [Link] (23 responses)

> How much longer should they have waited?

Until there is a consensus in the Linux community (not just in Poettering's mind or the systemd dev community) that this is the right thing to do.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:21 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (19 responses)

> Until there is a consensus in the Linux community
So, like, until 2056?

Distributors ponder a systemd change

Posted Jun 8, 2016 8:28 UTC (Wed) by rschroev (subscriber, #4164) [Link] (18 responses)

Well yes, or possibly even never. You simply don't make an intrusive change like this without some sort of consensus.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:32 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (17 responses)

That's why I'm glad that actual Linux (kernel and userland) developers don't care much about "consensus".

Distributors ponder a systemd change

Posted Jun 8, 2016 21:19 UTC (Wed) by xtifr (guest, #143) [Link] (15 responses)

But the kernel developers at least claim to care about breaking working code. And in this case, we're talking about a change that has the potential to render a system completely unusable! If I background (as I frequently do) a system update, and then log out, this change means the update will be killed at some unknown point. If this happens while certain key files are being updated, the result can be catastrophic. And if I later log back in and use ps to see if the update is still running, and find it isn't, I may decide it's ok to shut down, destroying any remote chance I might have had at a simple recovery. (Assuming I can still log back in in the first place....)

If it's a remote system, staying logged in during the entire update process may not even be an option. My laptop and I may have places we need to be.

Compared to the potential for catastrophe this change brings, the (admittedly real and very annoying) problem of programs which fail to shut themselves down properly when asked to do so seems minor.

Now, I admit that sometimes, a potentially system-destroying change is unavoidable, for various reasons. But in a case like that, there really have to be bright, shining, neon lights that say "if you do X at this point, your system may be corrupted!" Quietly making such a change without warning is simply not acceptable!

Fortunately, Debian seems to have restored the older default, so my home systems, at least, are safe.

Distributors ponder a systemd change

Posted Jun 8, 2016 22:43 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (13 responses)

> But the kernel developers at least claim to care about breaking working code.
Kernel developers do a lot of stuff that allows breaking existing code. Kernel build option defaults are routinely flipped to change some parts of user-visible behavior.

It's the same with systemd - its interface is stable.

Distributors ponder a systemd change

Posted Jun 9, 2016 4:35 UTC (Thu) by xtifr (guest, #143) [Link] (12 responses)

I was going to say I couldn't think of a kernel change that has as much potential for widespread breakage, but then I remembered the fsync nightmares of a few years back. So...you have a point.

Still, that's the only recent example I can think of where the kernel folks did something with as much potential for catastrophic breakage as this has. If you have some other examples, which affected programs as widely used as screen, tmux, emacs-daemon, and nohup, I'd be curious to hear them.

Distributors ponder a systemd change

Posted Jun 9, 2016 7:21 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (10 responses)

Technically, this change does not affect them - they can work just fine if this setting is disabled. It's a simple default value switch - you could have used the autokill mode since when it became available in 2013. Still, it would be nice if systemd developers at least tried to fix other important stuff first.

If you want a comparable change in Linux, then remember the /dev/hda -> /dev/sda switch. That broke a _lot_ of stuff that was doing stupid things like detecting hard disks by checking for /dev/hd? devices.

Distributors ponder a systemd change

Posted Jun 9, 2016 10:59 UTC (Thu) by xtifr (guest, #143) [Link]

It doesn't affect them _if they notice_ before the system is trashed. In other words, if they're aware of it. Which is really my main complaint. The danger potential is very high, but the number of warnings that were given were near-zero. I actually had this version of systemd installed with the dangerous new default, for a little while, and had no idea. (Which, yes, I know, is the risk I take running bleeding edge software from Debian's unstable distro. Even so...)

On the other hand, neither this nor the fsync thing actually bit me, while the /dev changes did, so, good example. :)

Distributors ponder a systemd change

Posted Jun 9, 2016 13:24 UTC (Thu) by johannbg (guest, #65743) [Link] (8 responses)

"would be nice if systemd developers at least tried to fix other important stuff first."

Which issues in systemd do you feel should have higher priority development priority?

Distributors ponder a systemd change

Posted Jun 9, 2016 17:19 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

Sorry, wrong wording - they should have proposed fixes to other projects that might reasonably depend on the current behavior.

Distributors ponder a systemd change

Posted Jun 9, 2016 19:17 UTC (Thu) by viro (subscriber, #7872) [Link] (6 responses)

FWIW, the thing that Johann doesn't seem to be able to grasp is this: the world does not owe us to recognize the greatness of our ideas and apply whatever efforts it takes to make them work. Not to me, not to him, not to Linus, not to Lennart, not to *anybody*. No matter how great the idea really is. Most of us get it by the time we are out of our teens...

If distros decide, en mass, that reverting the change of default is less headache than doing urgent fixups to screen/tmux/whatnot, then the whole thing had been handled wrong. By definition. And responsibility for the choice of tactics that happened to backfire is upon those who chose it. Especially since the headache for distros had been easy to anticipate, along with the likely areas where that headache would come from. FWIW, simple search shows reports of screen(1) breakage *5* *years* *ago* on that very thing turned on on reporter's box. In fedora. With Johann directly involved in handling of that report, amusingly enough, so if systemd developers were unaware of the likely sources of trouble, a part of blame was actually his...

Distributors ponder a systemd change

Posted Jun 9, 2016 19:26 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Sure. However, it is OK to propose changes that require some work from other projects.

If they fail to take off after many years of trying then the idea was obviously bad (see: Python 3) and probably should be adjusted and/or abandoned.

Distributors ponder a systemd change

Posted Jun 10, 2016 13:12 UTC (Fri) by vonbrand (subscriber, #4458) [Link] (2 responses)

Python 3 is slowly making inroads, held back by the still not complete fixup of key libraries. Usage stands at around 50% Python 3 and 70% Python 2 (the 20% overlap is code used with both).

Distributors ponder a systemd change

Posted Jun 10, 2016 18:25 UTC (Fri) by drag (guest, #31333) [Link] (1 responses)

If everybody refuses to accept that bad things happened in the python2 to 3 transition that caused problems and delayed progress then it's likely that it's just happen again, and again, and again.

There are two ways of screwing something up, or 'making mistakes'

1. No-fault. Based on information available at the time it seemed like it was a good idea. Unfortunately it turned out to have jacked everything up.

2. Fault. Based on information available at the time I knew it was a bad decision, but I thought I could get away with it. Too bad I got caught jacking everything up.

The first one is fine. It can't be avoided. It's part of how technology progresses and dealing with mistakes is just something we have to do. The second one is where you deserve to be removed from a position of trust.

Even if you have people disagreeing with you about choices you make it doesn't mean you fall into category 2, even if they are ultimately right. They just now have proof of their correct decisions. Doesn't mean they will be correct next time, though.

A lot of people see mistakes type 1 and then assign malicious intent in their minds to transform them to type 2, then go cry on the internet. A lot of people see people make mistakes type 1 and then try to erase the mistake because of a confusion that only type 2 mistakes exist, or they fail to realize the distinction.

Personally I like python3. I don't know what they could of done to improve the transition. Alternative seems to be what perl has done.

Distributors ponder a systemd change

Posted Jun 13, 2016 15:32 UTC (Mon) by niner (subscriber, #26151) [Link]

They could have improved the transition by providing backwards compatibility which would have been possible. It would even be possible still. Instead they required everyone to port their libraries and have all dependencies being ported before one can port one's own code. In hindsight clearly not the winning strategy and I wish they'd have used the chance to at least make a couple more steps forward (think graphemes and GIL) when they insisted on backwards incompatibility.

Distributors ponder a systemd change

Posted Jun 9, 2016 22:20 UTC (Thu) by johannbg (guest, #65743) [Link] (1 responses)

I have no problem accepting responsibility and taking blame when I deserve it so if that shit is on me, that shit is on me and I'll try to be better next time. I dont particularly take pride in having to leave project which left systemd only half integrated in the distribution ( and no one picked up that work and probably never will ) but I had no other choice so you can just as well throw that to the shit pile as well. A man is worthless if he cannot live up to his own standards and since I was unable to complete my work there I most certainly did not live up to mine...

Distributors ponder a systemd change

Posted Jun 10, 2016 0:23 UTC (Fri) by viro (subscriber, #7872) [Link]

Good sodding grief... seek professional help. Seriously. This kind of histrionics would've been over the top even in a teenager, and you are past that age.

a) left the project (fedora, I take it?) != lost all ability to contact systemd developers. Their list isn't closed, AFAIK.

b) whether you've failed to inform them about screen(1) breaking in such situation back then or not, they certainly could've looked themselves. Searching for bug reports mentioning that setting is not a rocket science.

c) I've no comment on the state of Fedora (before or after systemd transition - it's not something I use other than for testing and that only when I can't reproduce a bug on something saner; I sure as hell do not watch the politcs in it), and your choice of standards is up to you, but that amount of drama got to be counterproductive whatever those standards might be. I really, honestly have no fucking idea how you've parted ways with that project; judging by your postings years later it had to have been messy as hell and at a guess hadn't been any calmer than said postings (BTW, as an aside - what the hell _is_ in phoenix? You keep refering to it, and by the context it sounds like a center of some Evil Corporate Cabal(tm), presumably RH-related one. I'm fairly sure that RH headquarters are still in Raleigh and AFAIK there's no office in Phoenix - AZ or otherwise...)

Distributors ponder a systemd change

Posted Jun 18, 2016 16:29 UTC (Sat) by Wol (subscriber, #4433) [Link]

> Still, that's the only recent example I can think of where the kernel folks did something with as much potential for catastrophic breakage as this has.

Not recent, but it was a deliberate decision by Linus and it caused chaos ...

Who remembers him ripping the swap optimisation code out of kernel 2.4? Leading to a spate of linux systems crashing as sysadmins discovered "swap should be twice ram" was NOT an old wives' tale ...

Cheers,
Wol

Distributors ponder a systemd change

Posted Jun 9, 2016 8:49 UTC (Thu) by pflykt (subscriber, #2757) [Link]

> Fortunately, Debian seems to have restored the older default, so my home systems, at least, are safe.

...and which is the correct action for a responsible distribution to take, considering what its users think is the intended behavior, if I may add. This, and figuring out how to address the initial problem, is where the distributor really adds value.

Distributors ponder a systemd change

Posted Jun 13, 2016 8:23 UTC (Mon) by Jluis (guest, #28564) [Link]

But kernel makes a true effort to no break userland.

Distributors ponder a systemd change

Posted Jun 8, 2016 15:24 UTC (Wed) by johannbg (guest, #65743) [Link]

"consensus in the Linux community"

You do realize that there exist no such things as "consensus in the Linux community".

If someone as much as farts in the wrong direction he can have create 10 forks. 5 standards, 3 foundations and one distribution as an result of that because someone in some community did not like how that fart smelled.

Distributors ponder a systemd change

Posted Jun 8, 2016 15:48 UTC (Wed) by anselm (subscriber, #2796) [Link] (1 responses)

“Consensus.” You keep using that word. I don't think it means what you think it means.

See RFC 7282:

Engineering always involves a set of tradeoffs. It is almost certain that any time engineering choices need to be made, there will be options that appeal to some people, but are not appealing to some others. In determining consensus, the key is to separate those choices that are simply unappealing from those that are truly problematic. If at the end of the discussion some people have not gotten the choice that they prefer, but they have become convinced that the chosen solution is acceptable, albeit less appealing, they have still come to consensus. Consensus doesn't require that everyone is happy and agrees that the chosen solution is the best one.

Distributors ponder a systemd change

Posted Jun 9, 2016 21:21 UTC (Thu) by mstone_ (subscriber, #66309) [Link]

If there's ever consensus across the linux community, it must mean something other than what you posted.

Distributors ponder a systemd change

Posted Jun 8, 2016 16:06 UTC (Wed) by lsl (subscriber, #86508) [Link]

> This feature was in the very first public systemd release. How much longer should they have waited?

The feature itself is pretty useful and I assume no one has any issues with it. It's the flipping of the default setting that's the disruptive (and therefore controversial) change.

Distributors ponder a systemd change

Posted Jun 7, 2016 23:54 UTC (Tue) by darwish (guest, #102479) [Link] (47 responses)

If you look at it this change without history context, and without our UNIX administration experience bias dragging us back, it makes a whole lot of sense. In all computing scenarios, it's advised to force-release allocated resources upon exit. For example, the kernel closes all file descriptors and deallocate memory upon a process exit. _This happens even if the process did not clean up its allocated resources in a good manner before exit_. In the same vein, it seems rational for systemd-logind to force-close session processes if they did not manually clean up themselves in a good manner before session exit. As a system-level design decision, this is so obvious it's not even funny; it also matches the good practice of keeping the system state clean and sane.

Now people will complain how this will break tmux, screen, nohup, etc. But really, if systemd did not take this initial bold first step, no one will do it. I've always admired systemd boldness: it's that boldness that forced a well-engineered layer above the kernel; no matter how the UNIX administrators and server folks crying to keep everything in stone. This boldness will make traditional Linux builds still maintain their competitive edge wrt other operating systems over the long term.

Distributors ponder a systemd change

Posted Jun 8, 2016 0:19 UTC (Wed) by pikhq (subscriber, #98351) [Link] (1 responses)

Meanwhile the rest of us are working in the land of historical context. "Don't break working code" ought to be taken as a given, but it isn't here. Hence the screaming.

Distributors ponder a systemd change

Posted Jun 11, 2016 19:01 UTC (Sat) by micka (subscriber, #38720) [Link]

Yes, the spacebar overheating is still essential to my workflow.

Distributors ponder a systemd change

Posted Jun 8, 2016 0:49 UTC (Wed) by khim (subscriber, #9252) [Link] (1 responses)

It's Ok to introduce “bold”, backward-incompatible changes when you are doing experimental work. Once your creation is in use by millions “bold steps” are no longer allowed. Think Windows Phone (Windows Mobile had 12% market share, Windows Phone 7 broke everything and as a result Windows Phone will, most likely die) and compare it to Windows itself (at introduction it also was pretty much incompatible with previous setups but that just meant that people ignored it… only when Windows 3.0 made it possible to use MS DOS programs it took off). Note that you could eventually remove stuff (Windows x64 no longer supports MS DOS programs) but there must be a transition period.

The sane decision here would be a dialog which asks user about it (similarly to how user is asked if they want to keep their “Documents” directory name after locale change), then add code to screen/tmux/etc (if people just flat out refuse to cooperate then Ok, you could just include link to the appropriate bug in “release notes”), etc.

IOW: this may be a good default, but it still breaks user's expectations without warnings. This is really bad—lack of warnings, that is. The change itself may be good, but “bold” moves like this is how your create not Linux, but Plan/9: good (as in: really good, no quotes) OS which nobody uses (not even it's creators).

Distributors ponder a systemd change

Posted Jun 20, 2016 19:04 UTC (Mon) by ThinkRob (guest, #64513) [Link]

> It's Ok to introduce “bold”, backward-incompatible changes when you are doing experimental work. Once your creation is in use by millions “bold steps” are no longer allowed.

But that's the strength of the distro model isn't it? Upstream projects can experiment, do cool stuff, etc. and the distros make sure that the parts that they ship are configured to suit whatever the goals of the distro (ease of use, niche-specific stuff, whatever.)

And that's exactly what's happening here. systemd switched the default in their upstream repo, and it's up to the distros to determine whether to follow that change immediately, give people some lead time, or say "fuck it" choose to ignore the change forever. All three are valid options depending on the distro's goals.

Distributors ponder a systemd change

Posted Jun 8, 2016 1:05 UTC (Wed) by viro (subscriber, #7872) [Link]

"If you look at [...] without history context, and without our [...] experience bias dragging us back, it makes a whole lot of sense."

Bravo. One rarely sees such a superb example of sarcasm these days. Mind if I steal that for the next time I need to suggest (politely) that such-and-such proposal is crap obvious for anyone with even a modicum of experience?

Distributors ponder a systemd change

Posted Jun 8, 2016 6:38 UTC (Wed) by ras (subscriber, #33059) [Link] (15 responses)

> without our UNIX administration experience bias dragging us back,

Dragging us back? That would be reasonable if the majority of Linux deployments were now multi-user machines, whose users were miscreants prone to starting rouge processes with nohup, screen or whatever. In that case sysadmin's and developers insisting the defaults suit them would indeed be holding things back.

But that's not the case is it? The machines running systemd are dominated by embedded computers, servers and developer laptops. On these machines if the user knows enough to use nohup, screen or tmux, it's almost certain any process they want left running after logout should remain running. So surely that should be the default. In the very rare case that's wrong (I'm struggling to think of one - public use machines in a Uni - but do they still exist?) the sysadmin in charge of those boxes can set the KillUserProcesses option.

That aside, it looks to me this change wasn't introduced for the philosophical reasons you mention. It is a kludge to work around a Gnome design bug: https://github.com/systemd/systemd/issues/2900 In fact kludge is far to kind a word. It's a horrible hack.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:13 UTC (Wed) by ovitters (guest, #27950) [Link] (14 responses)

I don't see how you say that there's a GNOME design bug? I saw this being repeated various times, but it has to do with systemd user sessions. Those user sessions are completely new. Under a user session the process managing that session should manage that session.

Please clarify.

Distributors ponder a systemd change

Posted Jun 8, 2016 11:12 UTC (Wed) by ras (subscriber, #33059) [Link] (13 responses)

> I don't see how you say that there's a GNOME design bug? I saw this being repeated various times, but it has to do with systemd user sessions.

I have trouble distinguishing between GNOME and systemd. To the extend they are separate projects, you may be right.

Roughly what happened was:

1. In the beginning there was there login. Every process started after login was a child of it, the kernel used a very simple process to track those children and so it was easy to clean up on logout.

2. Then X and xdm replaced login, but every process was a child of xdm, and cleaning up on logout remained simple.

3. Then there was GNOME, and gdm, and later gdm spawned corba. Things were rapidly getting more complex, but nonetheless everything was a process child of gdm and so cleaning up on logout was till simple.

4. GNOME moves to dbus.

5. systemd takes over dbus.

6. systemd takes over session management - primarily via logind.

7. GNOME immediately adopts logind, causing much angst on Debian because it meant the default desktop required you to use systemd.

8. GNOME starts uses dbus to lazily start services.

9. systemd starts dbus under a separate process tree (the one under systemd --user, as opposed to the one started by gdm).

10. GNOME notices if the user logs in twice, they start services such as the evolution-address-book twice. Seems inefficient. They share services between two login sessions. For some services.

11. Consequently keeping track of what session owns what process becomes hard. Some things aren't killed properly when the sessions logout. Since logind is tracking the sessions, seems like a good idea to make it the systemd mob's problem. KillUserProcesses is implemented, and GNOME's problem is solved.

12. But no one is turning KillUserProcess on so GNOME sessions are still leaving services running. So systemd-230 changes it to default to be on.

And so here we are. If you tell me it's really systemd's at fault then so be it - for me it like picking between two peas in a pod.

Distributors ponder a systemd change

Posted Jun 9, 2016 2:10 UTC (Thu) by BradReed (subscriber, #5917) [Link]

excellent summary, thanks.

Distributors ponder a systemd change

Posted Jun 9, 2016 23:47 UTC (Thu) by xtifr (guest, #143) [Link] (4 responses)

Interesting. That definitely makes sense.

However, that suggests another possible approach to this problem. Instead of having evolution-address-book ignore sighup (which is what I assume it's doing now, so that it can survive one session's ending), have it respond to sighup by checking how many sessions it's attached to! If the number is one (or less), it can shut down gracefully, but if the number is higher, then it knows it should ignore the signal!

Of course, this would mean that the programs would have to know how many sessions they were attached to, but isn't that exactly the sort of thing dbus was created to handle?

This would place the problem where it belongs (on Gnome processes which want this session-sharing feature), rather than on unrelated software (screen/tmux, etc., anything launched via nohup, and unknown number of "little" homerolled background persistent thingies).

Distributors ponder a systemd change

Posted Jun 10, 2016 0:50 UTC (Fri) by johannbg (guest, #65743) [Link] (2 responses)

"Good sodding grief... seek professional help. Seriously. "

What an excellent idea we can go together I for my chronic depression and you for your alter ego which gets you off by judging, putting down or belittling other people in comment section and on mailing lists. ;)

Distributors ponder a systemd change

Posted Jun 10, 2016 9:31 UTC (Fri) by xtifr (guest, #143) [Link] (1 responses)

Huh?

Oh, you replied to the wrong post. Never mind. It happens.

(For a second I thought you were suggesting that my proposal was extremely crazy. Which it might be, but a quick skim showed me the post you were trying to respond to.)

Distributors ponder a systemd change

Posted Jun 10, 2016 11:08 UTC (Fri) by johannbg (guest, #65743) [Link]

Ah I see I was wondering why viro had not replied to that since he's one of those "I have to have the last word" type of person to gloat or further drive in his belittlement of other beings but yeah I was responding to viro comment half a sleep late last night and have accidentally clicked reply to the wrong comment.

I would have spotted it if the sites comment editor would be posting what's being responded to above not below itself ( it arguably should do so in atleast above the comment preview so the preview is in full context with what's being responded to ).

Sorry for the mistake.

Distributors ponder a systemd change

Posted Jun 11, 2016 15:13 UTC (Sat) by krake (guest, #55996) [Link]

> If the number is one (or less), it can shut down gracefully, but if the number is higher, then it knows it should ignore the signal!

My guess is that the problem is that the program is only attached to one session, or more specifically one bus.
I.e. previously each login had its own D-Bus session (session bus), but now there is one D-Bus "session" per user (user bus).

But again, this is just a guess.

Distributors ponder a systemd change

Posted Jun 10, 2016 11:27 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

This is looks like a plausible theory but it's most definitely a wrong one. I've seen lingering GNOME processes left behind years before systemd was even imagined. I'm not really sure what/when spawned them (gvfs with it's use of FUSE was the main culprit, but I think there were others, too), but that happened way earlier than systemd introduction.

IOW: systemd solves the real and sizable problem which predates it, although it's not clear if that's the best solution available (well, it's certainly the best available solution in a sense that it's the only one that works, but it does not mean it couldn't be better).

Distributors ponder a systemd change

Posted Jun 11, 2016 5:46 UTC (Sat) by ras (subscriber, #33059) [Link]

> This is looks like a plausible theory but it's most definitely a wrong one. I've seen lingering GNOME processes left behind years before systemd was even imagined.

So have I. Yes, it's definitely the same symptom. But the same symptom does not mean the underlying cause is the same - particularly in software.

Distributors ponder a systemd change

Posted Jun 12, 2016 15:37 UTC (Sun) by davidstrauss (guest, #85867) [Link] (1 responses)

> systemd took over DBus.

systemd *uses* DBus and provides its own DBus client library. It's basically what cURL's relationship to Apache is.

Just in case you're thinking about kdbus, it's not part of systemd, hasn't happened yet, and may or may not happen in the future.

Distributors ponder a systemd change

Posted Jun 12, 2016 19:43 UTC (Sun) by johannbg (guest, #65743) [Link]

Kdbus is dead and the code that could use it in systemd is being removed to ensure it will never be used so kdbus and systemd is an certain "will not happen in the future".

Distributors ponder a systemd change

Posted Jun 13, 2016 11:34 UTC (Mon) by gb (subscriber, #58328) [Link] (1 responses)

If there are gnome processes requiring cleanup why in bloody hell this should have influence on anything else? Just ask dbus to list processes connected to it AND 'gnome' and kill em.

System-wide change is clearly not necessary here!

Distributors ponder a systemd change

Posted Jun 13, 2016 13:33 UTC (Mon) by micka (subscriber, #38720) [Link]

Gnome is just an example of such program, but as some people like to hate gnome, the simple fact that the name appeared somewhere makes it the center of grief (something it shares with systemd).
And I suppose not all such processes use dbus.

Distributors ponder a systemd change

Posted Jun 15, 2016 2:57 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> KillUserProcesses is implemented, and GNOME's problem is solved.

The flag was implemented way long ago (before 2011). This is about changing the default.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:48 UTC (Wed) by diegor (subscriber, #1967) [Link] (7 responses)

What you call boldness, I call carelessness. They introduce it not, for security or for cleaness. But just to put a (bad) fix to a problem in gnome. Without thinking to what happens, or asking for suggestion, (or just putting a warning).

And there is already a "clean the process after logout", it is called sighup. And it works. When i work on a remote server with ssh, I never see a process surviving the logout, that are not meant to do it.

Now, desktop is a different beast. Most of the problem come from program launching new process with nohup. That's happen because, they don't want the process be killed, when the father process is terminated. For example when you open an attachment from you email client, the viewer is a child of the email client. So when you close it, unix, as expected, clean all his child process. Because is cleaner, right? Do you see the problem now? Cleaning process is a desiderable thing, in theory, but the user complain, so every one start to mark the child process, as a process to not be cleaned out.

A better solution would be to have a "launcher" process for gui. Your email client instead of forking a new process, it can talk to the launcher and let it fork the new process. So when you logout, let the "launcher" clean his process, and let him kill them with fire, if it is really needed (without breaking everything else).

Distributors ponder a systemd change

Posted Jun 8, 2016 9:21 UTC (Wed) by matthias (subscriber, #94967) [Link] (4 responses)

sighup is not working (at least not all the times). Every program that has some cleanup to do before exiting is intercepting sighup (it has to). If the shutdown of the program fails, it is left running. I have seen such processes hanging around forever, possibly using 100% of CPU time.

Also what is the difference if systemd kills you process or the launcher process of the gui? There has to be an opt-out for screen, tmux and nohup in both cases.

Distributors ponder a systemd change

Posted Jun 8, 2016 11:56 UTC (Wed) by ras (subscriber, #33059) [Link] (3 responses)

> sighup is not working (at least not all the times).

I remember I have had a process going infinite on logout, but can't actually recall when - it was a long time ago. Infinite loops after getting a sighup must be easy to track down.

What I can recall happening recently is a kernel driver misbehaving. I go through the rmmod, modprobe dance. But instead of fixing it the processes using the driver hangs on some unbreakable kernel lock, and the eventual kill -9 is a complete waste of time. The only way out is a reboot of a production machine. I gather race conditions during module removal are unavoidable, and regardless there always seems to be more of them. If the systemd or gnome guys have a fix for this, no matter how bad, I promise to call it a thing of beauty.

But using this sledge hammer to cure a simple infinite loop bug, and break backward compatibility - sorry, no.

Distributors ponder a systemd change

Posted Jun 8, 2016 13:51 UTC (Wed) by bronson (subscriber, #4806) [Link] (1 responses)

Since processes surviving logout has been a problem for decades, it seems like the infinite loops are not so easy to track down after all?

Distributors ponder a systemd change

Posted Jun 8, 2016 22:03 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Not if they happen in a subset of conditions. For example, if you have an encrypted home directory that is lazily unmounted after the logout.

Distributors ponder a systemd change

Posted Jun 8, 2016 14:51 UTC (Wed) by diegor (subscriber, #1967) [Link]

But if not even kill -9 can kill it, systemd can't kill it too.

Usually process for which kernel is serving a system call, can not be interrupted, until kernel have finished.

Distributors ponder a systemd change

Posted Jun 8, 2016 11:23 UTC (Wed) by niner (subscriber, #26151) [Link] (1 responses)

If SIGHUP worked that well, we wouldn't currently have almost 100 useless processes from previous sessions on our development server and I wouldn't have to reach for killall from time to time to clean up.

Distributors ponder a systemd change

Posted Jun 10, 2016 22:23 UTC (Fri) by xtifr (guest, #143) [Link]

So you have a bunch of programs with bugs (don't respond properly to SIGHUP), and this will merely hide those bugs!

Wouldn't it be better in the long run to have those bugs fixed? And wouldn't it be easier to fix them if the bugs weren't hidden?

In the case of the various GNOME-specific processes which are supposed to be shared between sessions and thus shouldn't necessarily just die whenever one session shuts down (so they can't just die on SIGHUP), they should, instead, keep track of how many sessions they've been attached to (probably via dbus or something), and kill themselves when the last session goes away.

And this kill-everything option could remain an option for people like you who have buggy processes which don't shut themselves down when they should (either when receiving SIGHUP or when their last session goes away). I'm fine with that. But making it a default for everyone, and forcing every program that might be intended to survive beyond logout to be modified just for systemd-based systems seems like the wrong choice.

Distributors ponder a systemd change

Posted Jun 8, 2016 21:55 UTC (Wed) by JoeF (guest, #4486) [Link] (17 responses)

If I start a process nohup, then I know what I am doing and I want it to stay around.
And the system damn well has to follow my wishes.
A user-mode process then going and quietly killing the process against my explicit wish is a really bad thing!
They need to honor things like nohup.

Distributors ponder a systemd change

Posted Jun 8, 2016 23:37 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (15 responses)

> They need to honor things like nohup.

What makes you think they aren't? If you start a process with nohup then it won't be sent a SIGHUP when the shell's controlling terminal is closed, just like it says in the manual. Nothing about that has changed. It was always a rather imprecise way to manage process lifetimes, though.

Killing a user's processes on logout is hardly a new idea. This was the policy on the shared Linux servers at my university over a decade ago, for example. The ability to do this reliably with logind rather than ad-hoc scripts would be a welcome change.

Should this be the default? I'd say that's for distributions to decide. But it's a nice feature to have available, and IMHO any programs (like screen and tmux) which are meant to provide long-running services independent of the session they were started from should register themselves as separate user sessions, regardless of whether killing processes on logout becomes the default.

Distributors ponder a systemd change

Posted Jun 9, 2016 18:12 UTC (Thu) by JoeF (guest, #4486) [Link] (14 responses)

But systemd doesn't send SIGHUP, it sends SIGTERM and then SIGKILL. That's the problem.
They should send SIGHUP, or not send anything at all.

Distributors ponder a systemd change

Posted Jun 10, 2016 15:55 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (13 responses)

> But systemd doesn't send SIGHUP, it sends SIGTERM and then SIGKILL.

Systemd sends both SIGHUP and SIGTERM, followed some time later by SIGKILL. Your objection is mainly to the SIGKILL, but this is the same process that is used to terminate programs on system shutdown, and the correct way to end a user's session after the user logs out.

This setting is primarily about how to detect when the user's session has ended. That matters because various per-user (not per-session) background processes, most notably dbus-daemon running in user-session mode, are meant to terminate when the user's last session ends. But is the end of the session when *all* the processes started in that session exit, or when the *main* process exits? In the former case (the previous default) you end up with sessions that never terminate because they contain processes which are waiting for those per-user background processes to exit. In the latter case a handful of programs need to be more specific about the fact that they are really per-user services and not part of a particular login session—which is the right thing to do in any case.

Note that assembling a screen.service file that runs "screen -dm" as a background service outside of the session is really rather trivial, and requires no modification to the screen binary. It can be started automatically, or manually with "systemctl --user start screen@session.service".

bash$ cat ~/.config/systemd/user/screen@.service
[Unit]
Description=screen

[Service]
Type=forking
Restart=always
ExecStart=/usr/bin/screen -dmS %I
ExecStop=/usr/bin/screen -S %I -X quit

[Install]
WantedBy=default.target
DefaultInstance=autoscreen

bash$ systemctl --user enable --now screen@.service
bash$ systemd-cgls # Amended for brevity
Control group /:
-.slice
└─user.slice
..├─user-XXXX.slice
..│..├─user@XXXX.service
..│..│..└─screen.slice
..│..│......└─screen@autoscreen.service
..│..│..........├─27015 /usr/bin/SCREEN -dmS autoscreen
..│..│..........└─27016 /bin/bash
..│..├─session-2.scope
..│..│..├─ 2216 /bin/sh /usr/bin/startkde
bash$ screen -ls
There is a screen on:
........27015.autoscreen........(06/10/2016 10:31:12 AM)........(Detached)

The screen@.service file can also be installed system-wide as part of a system package rather than in one user's ~/.config directory. Note that you still need to run "loginctl enable-linger" in order to have the screen service (or rather, the parent user@.service) survive without any active login sessions, and you need to attach to the existing screen session rather than letting screen start a new one.

Distributors ponder a systemd change

Posted Jun 12, 2016 5:59 UTC (Sun) by elvis_ (guest, #63935) [Link] (12 responses)

Yes, that seems very trivial compared to just typing "screen"

Not everyone has the time to relearn things they shouldn't have to.

Distributors ponder a systemd change

Posted Jun 12, 2016 10:02 UTC (Sun) by nybble41 (subscriber, #55106) [Link] (11 responses)

> Yes, that seems very trivial compared to just typing "screen"

From the point of view of either upstream or a distribution, it _is_ trivial. The template unit file can be provided as part of the screen package, and with a minor tweak to screen to do "systemctl --user start screen@$id.service" instead of running new servers directly, end-users wouldn't even need to be aware of the difference.

Even without support from the distribution or upstream there's not much to learn: run a few one-time commands to set things up—you don't need to understand unit file syntax, just copy the provided template into the configuration directory—and remember the one systemctl command needed to start a new session. Or take the quick route and (after running "loginctl enable-linger" once) alias screen='systemd-run --user --scope /usr/bin/screen'. If you can't handle that much you should probably avoid mainstream operating systems in general, as they all change at least this much on a regular basis. You might be more comfortable with one of the BSD variants, though even they aren't _completely_ static.

Distributors ponder a systemd change

Posted Jun 12, 2016 23:27 UTC (Sun) by elvis_ (guest, #63935) [Link] (10 responses)

Did you really just say I shouldn't use Linux because systemd breaks userspace too much? The year of the Linux desktop may have just receded into infinity... I've been using Linux since close to when you were born, my perspective is a lot different to yours I think.

Distributors ponder a systemd change

Posted Jun 13, 2016 6:18 UTC (Mon) by anselm (subscriber, #2796) [Link] (9 responses)

No, he suggested that you should use a distribution that provides the required hand-holding if you can't be bothered to learn some pretty cool new stuff like systemd (which would incidentally come in useful in a few other places, too). Not quite the same thing.

Seriously, if you're that worked up about this, we're talking about one switch that you need to flip to make the issue at hand go away, and that's only if your distribution doesn't do it for you already. We can quibble endlessly about whether changing that default was a great idea, and there are reasonable arguments on both sides – but as far as I'm concerned, “That's not how we used to do it in 1980” is one of the less reasonable arguments.

Distributors ponder a systemd change

Posted Jun 13, 2016 7:14 UTC (Mon) by jrigg (guest, #30848) [Link] (8 responses)

>No, he suggested that you should use a distribution that provides the required hand-holding if you can't be bothered to learn some pretty cool new stuff like systemd

There's a big difference between "can't be bothered" and "don't have time".

Distributors ponder a systemd change

Posted Jun 13, 2016 7:36 UTC (Mon) by anselm (subscriber, #2796) [Link] (7 responses)

That's a lame excuse if ever I heard one.

Learning the basics of systemd takes one or two hours, tops. It's not exactly rocket science. That would cover the various types of unit files, what they contain and where they're located, how service activation works, the systemctl command and its more important subcommands, and an overview of ancillary software such as journalctl or systemd-logind. It should certainly give one enough knowledge to be dangerous and to build upon incrementally as required. There's what I would consider a reasonable primer on systemd in this manual from the tuxcademy project (although I'm biased because I wrote it myself), and Lennart Poettering's blog and the documentation on freedesktop.org are also worth a peek.

Given the importance of systemd in current and future Linux systems, one would be more than justified in considering these two hours a reasonable investment (for some people it would also be worth it just to learn enough about systemd to not appear ignorant in discussions on LWN.net). Think of it as alternative entertainment on an evening when there's nothing interesting on TV.

Distributors ponder a systemd change

Posted Jun 13, 2016 10:50 UTC (Mon) by johannbg (guest, #65743) [Link] (2 responses)

Perhaps end users can cover the very basic of systemd in an hour or two but for upstreams it requires them to have in depth understanding to be able to accept and or write and maintain an proper type unit file. So for upstreams ( and arguably administrators as well ) it takes a much more time both to grasp it and then to fully test it with their application or application stack and or infrastructure and it environment(s).

People that approach and view systemd as a new technology with new concepts adapt to it quicker than those with any background in any legacy init system in which they more often than not approach systemd as an legacy init system and apply legacy init system concepts that are not applicable to systemd ( and expect same and similar outcome or behavior ) like for example the concept of "run levels" which does not exist in systemd but the concept of boot targets does etc.

Distributors ponder a systemd change

Posted Jun 13, 2016 12:22 UTC (Mon) by anselm (subscriber, #2796) [Link] (1 responses)

In my experience as a Linux instructor, one or two hours of systemd instruction is adequate to provide the basics for people who would otherwise be using System-V init as system administrators. Building on that, it is certainly more feasible to spend another couple of hours teaching somebody how to write a systemd service unit file for a new service and to integrate that into an existing setup, than it is to spend a couple of days teaching them enough shell scripting and distribution-specific minutiae to be able to write a robust System-V init script for a new service and to integrate that into an existing setup, on one single distribution. (The next distribution is going to be subtly, or not so subtly, different.)

For an upstream project, it is reasonable to invest the time to produce a good systemd-based configuration, which by now is likely to be applicable with few if any changes to a large number of platforms, because the effort for that is going to be smaller, in the long run, than the effort required to test and tweak new versions of their application (or application stack) on a huge number of subtly different legacy environments that all require some degree of individual adaptation.

Distributors ponder a systemd change

Posted Jun 13, 2016 16:01 UTC (Mon) by johannbg (guest, #65743) [Link]

In my experience as linux instructor as well as handling the migration of legacy sysv initscripts in the hundreds I agree that the learning curve for systemd is less than the learning curve of both system-v and bash and shell scripting combined but managing to cram into students heads all the different type units ( close to 20 now ) within an hour or two and manage to have them write them as well within that time frame well let's just say you must have higher intelligent and more efficient audience or relatively few in class compared to me and the problem with subtle difference between distribution still exist so the notion that that upstream can use the same type unit file across distribution is not always so ( thou in many cases it will just work ).

Daemon vs socket activation, path used in type units and simply the name of the component ( apache vs httpd for example ) etc still differs between distributions and that problem will never be solved unless unification in the core/baseOS can be achieve so those upstream(s) that actually care and ship initscripts of any kind are still dealing with that issue.

Distributors ponder a systemd change

Posted Jun 13, 2016 12:38 UTC (Mon) by jrigg (guest, #30848) [Link] (2 responses)

Learning the basics of systemd may well only take an hour or two. Keeping up with the constantly moving goalposts is another matter. Systemd's method of configuring realtime privileges for users has changed at least twice that I'm aware of (and the relevent documentation at freedesktop.org is two years out of date), for example.

It's easy to say that's purely a problem for the distros, but system upgrades are painful enough without having to learn several different versions of things in preparation for the next one.

Distributors ponder a systemd change

Posted Jun 13, 2016 20:58 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Keeping up with the constantly moving goalposts is another matter. Systemd's method of configuring realtime privileges for users has changed at least twice that I'm aware of (and the relevent documentation at freedesktop.org is two years out of date), for example.
You awareness is incorrect. Systemd is committed to backwards compatibility, so my units from 2012 still work just fine.

Distributors ponder a systemd change

Posted Jun 14, 2016 13:05 UTC (Tue) by jrigg (guest, #30848) [Link]

>You awareness is incorrect. Systemd is committed to backwards compatibility

According to this, https://www.freedesktop.org/wiki/Software/systemd/MyServi... the way to allow realtime scheduling for users for a specific service is to add ControlGroup=cpu:/ to its [Service] section. The ControlGroup= option was removed in systemd 205 (July 2013) but the document hasn't been changed. That's an example of what I was referring to.

To be fair it's only one specific example, but it did contribute to my decision to stick with sysvinit-core on my Debian systems for the time being.

Distributors ponder a systemd change

Posted Jun 13, 2016 13:23 UTC (Mon) by paulj (subscriber, #341) [Link]

Make the new stuff opt-in though, and don't wilfully break the existing stuff if at all possible.

That's the basics of good system maintenance.

Distributors ponder a systemd change

Posted Jun 15, 2016 3:54 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> If I start a process nohup, then I know what I am doing and I want it to stay around.

Most of my "nohup" processes are things like image and PDF viewers for opening attachments from mutt so that the viewers don't close when I close the tmux window. I don't think `nohup` means "should persist the session" at all; something stronger needs to be done (I have it on my TODO list to do some experimentation with a "new-session" wrapper application to start a PAM session).

Distributors ponder a systemd change

Posted Jun 7, 2016 23:58 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (3 responses)

Distros can (and will) change the default back to "don't kill" until other packages are updated as needed.

At minimum, GNU screen has to be able to persist after logout; as a software developer I rely on that feature to deal with loss of network connectivity. Likewise, background compute jobs have to work.

I don't think it is "insane" to consider this change, or even to try to push it to create pressure for other packages to live in a world where there are restrictions on background processes after logout. But this is why it is good that the distros sit between the upstream and the end users; the change can only be delivered after the use cases are worked out in detail.

Distributors ponder a systemd change

Posted Jun 9, 2016 22:44 UTC (Thu) by jeff@uclinux.org (guest, #8024) [Link] (2 responses)

"Distros can (and will) change the default back to "don't kill" until other packages are updated as needed."

No, this breaks the expectations of the POSIX C runtime environment. If I write code to run on a supposedly compliant system by managing signals myself, I absolutely require that to work.

Someone with quite a bit of experience is noted for saying something like "UNIX doesn't have all the good ideas, just most of them". I rely on correct behaviour of signals, they were a good idea, and linking with some dumbass library to get behaviour I had before does not qualify as rational, let alone a good idea.

Fail.

Distributors ponder a systemd change

Posted Jun 9, 2016 23:09 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

In other words: "I've been driving down this road for the last 50 years and I don't care if you say there's a sinkhole ahead!"

Distributors ponder a systemd change

Posted Jun 16, 2016 16:12 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Someone with quite a bit of experience is noted for saying something like "UNIX doesn't have all the good ideas, just most of them".

Well, as someone with quite a bit of non-Unix experience (and no, that's not just Microsoft), ime Unix/linux has quite of lot of stupid ideas too. And given Linus' attitude of "We're posix compliant if posix makes sense", he probably thinks the same.

Simple example - "cp a b". What's that going to do? Oh, and I don't want a long winded answer with a load of "if"s in it :-)

Whereas on Pr1mos, "copy a b" gives me an exact copy of a (security permissions permitting) called b.

And ime, most of these comments seem to be bikeshedding between two camps - the "sooner the better" camp, and the "the right time is never" camp. If it's going to happen, then now is as good a time as any. How long has this change been in the works? Since the dawn of systemd? And if all these programs - screen, tmux, nohup, haven't done anything about it yet, then they're not going to unless something gives them a kick up the bum.

Cheers,
Wol

Distributors ponder a systemd change

Posted Jun 8, 2016 0:10 UTC (Wed) by Nahor (subscriber, #51583) [Link] (24 responses)

How is running a process while logged out any more risky than running it while logged in? Can someone explain?

Distributors ponder a systemd change

Posted Jun 8, 2016 0:49 UTC (Wed) by TMM (subscriber, #79398) [Link] (15 responses)

You can run a server in a screen that can give you access back to the system after you log out (or nohup or whatever) meaning it is not immediately obvious when/if you have successfully removed a user from your system even if they are not currently logged in.

It's possible, but hard. (The screen thing is just an example, actually making it hard requires more than screen, but this systemd change takes care of this entire class of problem)

Distributors ponder a systemd change

Posted Jun 8, 2016 1:06 UTC (Wed) by smoogen (subscriber, #97) [Link] (1 responses)

But does it? You have to block cron, at, system level containers, and all their ilk to actually make sure that a service doesn't fire up after a user logs in. And in the primary work case where you have a user able to log in remotely, they also need to be able to use these sorts of system level services so it doesn't stop their setting up a reverse nc shell or stop someone from piggybacking on the ssh multiplex and making sure the account never truly logs out.

I understand the security item that Lennart sees, but I think that this is a bandaid where the 'fix' he wants will require him to write his own distribution from the 'ground-up' and find the users and use cases to use it. He gets angry about the amount of band-aids he is already carrying around, but this is in many ways the fact that the users already have too many of the old around and can not just fork lift fix their infrastructure at his urging.

Distributors ponder a systemd change

Posted Jun 16, 2016 16:18 UTC (Thu) by Wol (subscriber, #4433) [Link]

> He gets angry about the amount of band-aids he is already carrying around, but this is in many ways the fact that the users already have too many of the old around and can not just fork lift fix their infrastructure at his urging.

The problem is that your "band-aid" is Lennart's security hole. All these band aids are unnecessary code, that is more likely than average to harbour bugs (and hide bugs in the other program too), and are dangerous things to leave lying around. And this example is classic - leaving processes lying around because the system can't/won't get rid of them by default is exactly that! If they're buggy enough not to shut down, how many other bugs do they harbour?

Cheers,
Wol

Distributors ponder a systemd change

Posted Jun 8, 2016 1:42 UTC (Wed) by dskoll (subscriber, #1630) [Link] (3 responses)

The security issue is a red herring. If you want to make sure that all of a user's processes have been terminated, you use pkill -U and pkill -u. Distros could even make their user-deletion tools ask if you want to do this and then do it for you if you say yes.

Distributors ponder a systemd change

Posted Jun 8, 2016 2:23 UTC (Wed) by hmh (subscriber, #3838) [Link] (2 responses)

Even that is too little. You also need to ensure no changes were left that could allow such users to have access to something they shouldn't have -- unless you are actually purging the user from the system ("userdel" style).

I seem to recall a full fix for the "desktop" *security* case requires something like a revoke() syscall, and a proper SAK implementation. I hope I am wrong, because those can be quite hard to implement.

I mean, what is the point of killing every job started in an user session in the name of security, when all the user has to do is to simulate a login screen without ending his session at all in the first place?

There are other ways to ensure no desktop environment components are left behind when the session is closed, especially because these are actually a bounded set *and* because there would be little reason for them to refuse the patches.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:27 UTC (Wed) by NAR (subscriber, #1313) [Link]

unless you are actually purging the user from the system ("userdel" style).

Well, if I want a user gone from the system, I'd want to do a decent job and purge it. Remove user id, cron jobs, all files, kill all processes, whatnot. I admit it might be complicated in a corporate multi-computer system when a user can have access to dozens of computers with some kind of single sign on - but systemd wouldn't help with the cronjobs either. Malicious users can also easily avoid the systemd reaping procedure. So I do think that the "security" issue is totally BS.

Distributors ponder a systemd change

Posted Jun 8, 2016 12:03 UTC (Wed) by dskoll (subscriber, #1630) [Link]

Even that is too little.

Yes, I know, but it is the equivalent (actually, a superset of) what systemd does when it kills processes started in the login session.

Distributors ponder a systemd change

Posted Jun 8, 2016 2:08 UTC (Wed) by Nahor (subscriber, #51583) [Link] (8 responses)

If the server runs under the user's id, it's easy to find.
If the server runs under its own account, then it doesn't matter if the user is logged in or not. The user could start it while staying logged in, then pretend nothing nefarious is happening (ssh is idle, terminal is idle, ...) while doing his evil things using the server.

To me, it seems that if you can't trust your users while they are logged out, you should be trusted them while there are logged in.

Moreover, there is still the "systemd-run" function to do the same thing as nohup/screen/... so it's not that they are removing the feature anyway, it's just used differently.

Distributors ponder a systemd change

Posted Jun 8, 2016 4:34 UTC (Wed) by rahvin (guest, #16953) [Link] (7 responses)

You're simplifying the problem. A zombie user process is a threat precisely because no one is actually in control of it making it available for anyone to potentially exploit with the user that started it no longer monitoring it. You don't have to distrust your users for that to be a security issue. I can think of hundreds of ways that could be bad even with a completely trusted user because everyone makes mistakes, zero day exploits exist, and zombie processes are an avenue for all kinds of mischief.

What if you had an orphaned ssl process when the heartbleed vulnerability was disclosed? Even if you patch the binary if you don't shutdown the zombies you just exposed your key to the world. IMO this is a good change to enable default security but I also totally agree that this should have been talked about very publicly that they were going to turn it on with X release. By not doing this they sabotaged their own effort because all the distributions will just disable it and boom it will never get implemented as default. And these programs like tmux and screen whose primary purpose is to daemonize a process (and have been around for more than 30 years) should have not only been warned but the systemd developers should have found a good way to retain that functionality that wouldn't require a dependency (in an upstream that's agnostic to the OS) to fix, even if that is a default exception for those specific binaries.

tldr It's a good change but it was handled terribly.

Distributors ponder a systemd change

Posted Jun 8, 2016 4:41 UTC (Wed) by drag (guest, #31333) [Link] (4 responses)

A zombie process is one that is killed and is waiting for the parent to reap it's return code.

There is no threat from them. They are not running anymore. They may be using up some memory in a process table or something, but that is about it. All it really means is that if you end up with a bunch of zombie processes you have a bug in the kernel or init or something.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:57 UTC (Wed) by sbakker (subscriber, #58443) [Link] (3 responses)

I suspect rahvin meant "orphaned" rather than "zombie". Nevertheless, the term "zombie" is confusing, implying that the entity is somehow still active. It isn't. In fact:

“This process is no more. It has ceased to be. It's expired and gone to meet its maker. This is a late process. It's a stiff. Bereft of life, it rests in peace. If its parent had reaped its exit status, it wouldn't be occupying a process slot, but pushing up the daisies instead. It's rung down the curtain and joined the choir invisible. This is an ex-process.”

A better name for zombie processes would be "corpse" or "carcass", but that just looks wrong in "top" listings. "Dead parrot" has a nice ring to it though.. :-)

Tasks: 304 total,   1 running, 303 sleeping,   0 stopped,   0 dead_parrot

Distributors ponder a systemd change

Posted Jun 8, 2016 15:44 UTC (Wed) by drag (guest, #31333) [Link] (2 responses)

Yes, I figured we might as well be consistent in what we are talking about here. Zombie is a very commonly used term.

A 'orphaned' process is essentially a daemon.

The problems we are running into now is that in Unix-ville every application was tied to a TTY. The 'getty' process was the 'session manager'. However it's complete shit to work in a modern environment by 'backgrounding processes'. So using daemons for setting up user environments and such things are very common. So is using terminal multiplexers like tmux or screen.

These are really just bandages to a bigger issue. All these things are just work arounds to the 'tty' limitations. As a result it's a big mess. You can control when you start up all these processes, but they are no longer tied to anything. You have to have some process to come back in when you log out to clean them up.

With systemd we now have the ability to have a true user session that is no longer tied to a particular tty. Systemd can 'daemonize' processes without actually going through the traditional deamonizing processes. It makes sense that when the user session dies so does their programs. It's just the way things should of always worked.

Distributors ponder a systemd change

Posted Jun 8, 2016 21:02 UTC (Wed) by rahvin (guest, #16953) [Link] (1 responses)

I did mean orphaned. I had one of those moments where I couldn't think of the right word so I used a different one.

I completely agree this is a needed change. The problem is it breaks 30 years of experience, and those types of breaks need lots and lots of warning or people will put workarounds in place that negate the change. Every distribution disabled it, IMO that's a sign they handled the messaging wrong.

Distributors ponder a systemd change

Posted Jun 8, 2016 23:35 UTC (Wed) by johannbg (guest, #65743) [Link]

"IMO that's a sign they handled the messaging wrong."

If by "they" you mean downstream distribution package maintainers of systemd you might be right depending on which community you reside in.

If by "they" you mean the upstream systemd community, it follow standard announcement procedure like most project do and that even got picked up and announce here on lwn [1].

Unlike many upstream project there is also always last call [2] open for couple of days before each upcoming release ( encase someone might have missed something or wants to discuss some change before release ) and for some facts ( since some people might find those relevant ) the change was committed in early april and it was Zbyszek that was pushing for this change not Lennart ( but apparently the pattern is blame Lennart or Gnome for all the world problems ).

1. https://lwn.net/Articles/688640/
2. https://lists.freedesktop.org/archives/systemd-devel/2016...

Distributors ponder a systemd change

Posted Jun 8, 2016 12:03 UTC (Wed) by itvirta (guest, #49997) [Link]

> What if you had an orphaned ssl process when the heartbleed vulnerability was disclosed?
> Even if you patch the binary if you don't shutdown the [running processes] you just exposed your key to the world.

That's a problem with all binary / library upgrades (libc and static binaries too). You need a way to find if a process is still using the old binary, and for long-running services
that needs to be done during the upgrade, not during some arbitrary point in the future, like someone's logout.

Distributors ponder a systemd change

Posted Jun 8, 2016 16:07 UTC (Wed) by Nahor (subscriber, #51583) [Link]

You misunderstand my question. My question is not about why it's dangerous to keep an orphaned process running but why it's *more* dangerous than a non-orphaned process. Why is it risky to keep a process running while logged out but it would be acceptable to keep that same process running with a user logged in 24/7?

Distributors ponder a systemd change

Posted Jun 8, 2016 1:59 UTC (Wed) by droundy (subscriber, #4559) [Link]

Another scenario where this would help would be if I do not have sshd running and you login to my computer and then log out. I might think that when I log in myself that you won't be able to sniff my passwords. This change would protect me in that instance. Wayland would alternatively protect me from this particular attack, but I can imagine others where you could learn secrets just by watching ps.

Distributors ponder a systemd change

Posted Jun 8, 2016 3:24 UTC (Wed) by smurf (subscriber, #17840) [Link] (6 responses)

A process that lingers can do all sort of destructive things, like surreptitiously fill /tmp at 2am when the backup process runs, or temporarily fill /var/log to hide the fact that something tries to brute-force your mysql password.
There are also side-channel attacks that can break (some) cryptography. Or your background process can keep the microphone open and thus recover what's being typed.
Lots of possibilities for the enterprising miscreant.

Distributors ponder a systemd change

Posted Jun 8, 2016 6:39 UTC (Wed) by dd9jn (✭ supporter ✭, #4459) [Link] (3 responses)

This is done via cron (crontab -e, batch, at) and mitigated with quotas. Every Unix introduction tells you at least the former.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:31 UTC (Wed) by matthias (subscriber, #94967) [Link] (2 responses)

crontab -e, batch and at are only possible, if the administrator allows for them. The same should hold true for user processes lingering around after the user session is closed.

Distributors ponder a systemd change

Posted Jun 8, 2016 10:46 UTC (Wed) by hmh (subscriber, #3838) [Link] (1 responses)

It is possible to do that, and it has been possible to do that since day one: you pkill/killall after logout. There are a number of ways to hook a script to session end/logout.

I should add that pkill/killall can actually implement the "no user processes left" behavior, unlike the new systemd functionality, which is about "no processers started by this session are left when the session ends". Two very different things, but people seem to want to claim the new systemd behavior is actually useful for security, when it is *completely useless* for that, so it looks like we need to point out the utterly obvious...

The new behavior is a best-effort house cleanup thing, nothing more. And an unwelcome one *as implemented right now* at that, because it causes too much collateral damage for very little gain. The old behavior, where one would explicitly enable the functionality where useful, was a lot better.

Distributors ponder a systemd change

Posted Jun 8, 2016 11:30 UTC (Wed) by matthias (subscriber, #94967) [Link]

As far as I understand, the systemd functionality should ensure that no user processes are left after the last user session has exited, unless the administrator allows otherwise (systemd linger functionality, allowing at, batch or cron jobs for the user).

Having really no processes survive when a session ends makes no sense at all. If I open two sessions and log out of one of them, the second session would be killed. If every session takes care to kill its own processes, no process should survive.

For myself, I like the new behaviour. Not because of security, but because I think that it is the job of session management to do some clean-up. Of course this means that screen/tmux/nohup should get changed to work again. Once these few programs are fixed, there should not be much collateral damage. Before that, I do not expect this change to hit stable distributions, anyway.

Distributors ponder a systemd change

Posted Jun 8, 2016 16:13 UTC (Wed) by Nahor (subscriber, #51583) [Link] (1 responses)

A process that lingers is a danger regardless if you're logged in or not... If you force a user to stay logged in 24/7 so his processes don't get killed, then you're adding a live terminal and possibly a live SSH connection to the attack surface

Distributors ponder a systemd change

Posted Jun 8, 2016 16:26 UTC (Wed) by anselm (subscriber, #2796) [Link]

Which is presumably why systemd very sensibly does not force that situation, but gives the administrator a variety of tools to specify whether users get to keep processes running after they log out or not, and if so which users.

Distributors ponder a systemd change

Posted Jun 8, 2016 4:56 UTC (Wed) by drag (guest, #31333) [Link] (38 responses)

My personal opinion is that it's not a bad thing because it makes for a tidier system. It's 'more correct'.

What it actually accomplishes is that it helps systemd be a better session management for users. When users log out they don't usually want to have a bunch of processes lingering unless they explicitly expect it. Systemd brings the session into the world and it should be the one to take it out.

What I do NOT like is the fact that if you want to have link to a new dependency to have a program indicate to systemd that it needs to keep running. I especially don't like the idea of tying it into PAM or any such thing. And tmux/screen are not the only types of programs that I may want to linger around. I may want IRC bots, for example. Or have a program that collects and indexes my mail. Or some protein folding application, or whatever.

I should be able to tell systemd that I want a program last past logout. I should be able to do this with a entry in a service file or a systemctl command.

This way I can take full advantage of systemd to handle 'daemonizing' and logging and all that happy stuff without any effort, which will make things much simpler for me. I can program in whatever language I feel without having to figure out how to link it to a new library. And I still will be able to retain the benefits of having systemd kill any lingering process automatically that I don't explicitly tell systemd to leave alone.

To conclude:

I figure just A) add a option to systemctl/service files. B) make it possible for admins to disable the linger feature completely.. and then you have the problem largely solved. Distributions can wrap 'tmux' or 'screen' in a shell script to retain the old behavior if they want. Nobody will have to make significant changes to their programs to fit into systemd either.

Distributors ponder a systemd change

Posted Jun 8, 2016 7:34 UTC (Wed) by ras (subscriber, #33059) [Link] (37 responses)

> When users log out they don't usually want to have a bunch of processes lingering unless they explicitly expect it.

'nix has always fulfilled this expectation, since at least V7. When the user logged out every process in the login session was sent a SIGHUP. If the process wants to hang around it had to intercept the signal, since the default was to kill it. (And yes, I know you know this.)

If the user wanted it to hang around he had to use nohup or an equivalent. As far as I can tell that won't change under this regime, albeit nohup will have to jump through different hoops.

Inexplicably, they look to be worse hoops. Before a program could distinguish between the user logging out, (SIGHUP), and the user asking the program to exit (SIGTERM). So for example if I deliberately left a process running after logging out by masking SIGHUP, I could ask it nicely to exit by sending it a SIGTERM later. Now that distinction is gone.

There is a second change: the option to send a SIGKILL if the process doesn't respond to SIGHUP. I can think of situations that might be useful, although it is a stretch - it isn't in any systemd installation I'm familiar with.

A PAM session management plugin would be the cleanest way to implement it. For example, I don't recall every wanting to leave something running when I exit a GUI session on my laptop, but if I ssh into a server and start a tmux session, it had better damned well still be running when I log out. PAM can already distinguish between these cases - no extra code required. Sending the SIGKILL to wayward processes after a grace period would only be a few lines of code.

And there is a third change: to flip the default from not sending the SIGKILL to sending a SIGKILL for everything. To re-iterate what I said above, that would be a perfectly reasonable if most installations had policy abusing users, and so sysadmins found themselves having to change the default on most machines they configured. But given no one has bothered to write the PAM plugin in the last decade I doubt rouge processes running after logout are a serious problem. On the other hand, I can tell you because the current implementation can't distinguish between session types I personally would have to turn it off on every install I do. And I don't know anybody that doesn't apply to.

Finally, I am sure someone will argue that SIGHUP clearly doesn't work because there are occasionally rouge processes left around on logout. But they are only hanging around become someone has screwed up the session tracking (in which case this new solution won't work either), or because they are deliberately ignoring SIGHUP for some reason. Presumably the reason will remain after this change, so they too will alter their programs to jump through the new hoops. And so, after the few it takes everyone to adapt, we will be back where we started.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:42 UTC (Wed) by matthias (subscriber, #94967) [Link] (5 responses)

There is another possiblity: intercepting SIGHUP to do a clean shutdown of a program and then running into a bug. I have seen left-over processes just consuming 100% cpu time.

The session-mangement has to know wether a process should survive. If a process shall not survive, it has to be SIGKILLed if the usual shutdown procedure fails for whatever reason. We cannot expect all programs to be bugfree.

Distributors ponder a systemd change

Posted Jun 8, 2016 16:13 UTC (Wed) by ballombe (subscriber, #9523) [Link]

What about data loss when the program receives the sigkill before it completed its shutdown ?

Distributors ponder a systemd change

Posted Jun 10, 2016 16:17 UTC (Fri) by azumanga (subscriber, #90158) [Link] (3 responses)

But most processes that intercept SIGHUP want to stay alive, so are just all going to switch to telling session-management they want to survive?

Distributors ponder a systemd change

Posted Jun 10, 2016 17:49 UTC (Fri) by matthias (subscriber, #94967) [Link] (2 responses)

Simply not true. Every process doing file IO should intercept SIGHUP to ensure not terminating in the middle of some IO and producing garbage. If such a process freezes for some reason, the normal way of terminating will not work and the process is waiting to be SIGKILLed, either by the user, at session end or during shutdown. I have seen such processes. SIGHUP is simply a signal saying please terminate, your session has ended. Up to now session management was assuming that this always works. With this change systemd will clean up processes where this did not work.

Which processes do you have in mind that would be changed? Obviously tmux and screen. Anything else? Processes that are actually daemons do not count (they are out of scope anyway as they do not belong to the session). Neither do processes count that are explicitly backgrounded by the user. For these processes the user will make sure that session management does not kill them (starting with a wrapper like systemd-run).

Distributors ponder a systemd change

Posted Jun 10, 2016 18:05 UTC (Fri) by viro (subscriber, #7872) [Link]

... assuming they knew they'll need these processes to outlive the session back when they were starting said processes. Unfortunately, the situations when it's not true tend to be of the "lots of time already went into computation and kill&restart the right way is very unappealing" variety.

Distributors ponder a systemd change

Posted Jun 11, 2016 10:53 UTC (Sat) by ras (subscriber, #33059) [Link]

> Simply not true. Every process doing file IO should intercept SIGHUP to ensure not terminating in the middle of some IO and producing garbage. If such a process freezes for some reason, the normal way of terminating will not work and the process is waiting to be SIGKILLed, either by the user, at session end or during shutdown. I have seen such processes. SIGHUP is simply a signal saying please terminate, your session has ended. Up to now session management was assuming that this always works.

All accurate.

> With this change systemd will clean up processes where this did not work.

Sadly systemd isn't physic and so can't currently distinguish between when it didn't work and when it didn't matter. But as you observe, there are only a few known programs this effects. So they could be patched. And I willing to concede the inhouse broken by this change don't matter. Thats seems to be business as usual open source - I get screwed over by non-backward API changes on a fairly regular basis.

But this always cleaning up processes where "it didn't work": not a good idea. The default should be not hide the problem. Just in case you don't know: "didn't work" is bad thing. It's caused by a bug. It is better for all of us if we get that bug fixed ASAP. If processes hanging around caused most of us a lot of pain, then maybe you would have point. But we lived with it for 30 years, so the pain can't be that great.

Nonetheless as you have pointed out most is not all, and in particular this behaviour has caused you real pain. Fair enough. I hope nobody would argue with it providing a solution for you and anybody else that has it.

My issue: that solution has existed for 15 years.

Distributors ponder a systemd change

Posted Jun 8, 2016 17:41 UTC (Wed) by drag (guest, #31333) [Link] (7 responses)

> 'nix has always fulfilled this expectation, since at least V7. When the user logged out every process in the login session was sent a SIGHUP. If the process wants to hang around it had to intercept the signal, since the default was to kill it. (And yes, I know you know this.)

What if you consider a 'user session' to be system-wide instead of tied to a specific TTY?

I want to start processes that I can connect to and use regardless of how I log into a system, but when I am gone from the system completely I want them to be gone as well. Except in specific cases when I don't.

A example of this is Emacs.

Right now I use Emacs in 'daemon mode'. I would like to be able to connect and use it from different logins. I should be able to launch new client windows from another system over SSH and I would like to be able to use it from local login as well as X. When X dies I don't want it necessarily to be killed along with X if I happen to be using it from SSH for example.

But I also use Emacs for sensitive things. I have gpg-encrypted files that I open in Emacs for old passwords and things like that. So I _REALLY_ don't want Emacs sitting there running forever when I log out completely. It's far better for me to have to re-start emacs then have it sitting there with all my passwords decrypted in memory if I get forcefully disconnected from it.

I also want the behavior to be the same regardless how I happen to launch Emacs. If I launch it from a terminal emulator I want it's session behavior to be the same as if I launched it from ssh or from a X application menu.

> A PAM session management plugin would be the cleanest way to implement it.

How is a PAM session management plugin going to know which programs I want dead and which ones I don't?

Distributors ponder a systemd change

Posted Jun 8, 2016 23:17 UTC (Wed) by ras (subscriber, #33059) [Link] (6 responses)

> How is a PAM session management plugin going to know which programs I want dead and which ones I don't?

It can't easily know. (Easily being the operative word. It's just code, after all. It could say read ~/.kill-on-logout.lua, and use some function in there to decide.)

But I'm missing something here. How can any other solution easily know? Or perhaps you are focusing on the 1 -> 0 sessions trigger I gather the KillUserProcesses solution uses. If so, obviously the PAM plugin could use the same trick.

Sorry, I don't really understand your question.

Distributors ponder a systemd change

Posted Jun 9, 2016 4:49 UTC (Thu) by drag (guest, #31333) [Link] (5 responses)

> How can any other solution easily know?

Well now I have the feeling that I was missing something.

You just tell systemd you want the process to live via the 'system-run --user' or launch the program via '--user' service file with a 'linger' option. But I suspect that is what the behavior is now for '--user' for 230 anyways. I'll need to play around with it I guess.

If it is true you can cause a process to linger just with a invocation of system-run or defining a service file, then my concerns/issues/questions are already all addressed.

Yeah, so I don't know.

When you can strip out most of the deamonization portions of a program and replace it with a simple <prognam>.service text file or wrapper shell script that calls 'system-run'.... and get superior results then was possible before I suspect that is the way to go rather then screwing around with pam, calls to dbus, or anything else. The simple solution is usually the better one.

I am starting to suspect that this is all just one huge non-issue as far as the technical issues are concerned, the problems are solved and the work needed by the developers is actually simplified. It's the social aspect of it that is the problem. People don't like to see change. I am not trying to downplay the issue, though. The social aspects are important.

Distributors ponder a systemd change

Posted Jun 9, 2016 5:43 UTC (Thu) by ras (subscriber, #33059) [Link] (4 responses)

> I am starting to suspect that this is all just one huge non-issue as far as the technical issues are concerned, the problems are solved and the work needed by the developers is actually simplified.

Now I'm lost. The work required by developers under the old scenario was one of:

- None: if the user wants it to run after logout, he uses "nohup comand" or uses tmux or something.

- signal(SIGHUP, SIG_IGN)

How on earth could it get any easier than that?

Sure, it doesn't handle the emacs scenario you mentioned, but that seems to be pretty esoteric. It's very rare for me to have two logins on one machine, let alone a burning desire to share an editor session that I want killed when I log out of all sessions. Breaking backward compatibility, and adding all this complexity for the sake of something so specialised seems like a very odd design decision. It only gets odder when you know few lines of Python PAM module could do it.

Distributors ponder a systemd change

Posted Jun 9, 2016 7:33 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

What if a user _does_ NOT want a process to survive a logout?

The scenario is dead simple - you logout, you log in and get two copies of a process that should exist in only one instance.

Distributors ponder a systemd change

Posted Jun 10, 2016 0:57 UTC (Fri) by ras (subscriber, #33059) [Link]

> What if a user _does_ NOT want a process to survive a logout?

Two scenario's:

The program does not have a bug, which means if it survived logout he ran it under with screen or similar. Solution: don't run it under screen
The program has a bug, and he files a bug report. In the mean time kill it maybe? It's not difficult.

And so now you say but this is effecting a large cluster of machines the unwashed masses use - and they aren't going to do the kill. So the sysadmin that looks after them to interrupt his tea break on occasion - until the bug fix arrives. As a part time sysadmin myself, I have to concede this is indeed a very serious situation.

Fortunately, there is a workaround. A short PAM module:

import os, signal, sys, time def pam_sm_open_session(pamh, flags, argv): return pamh.PAM_SUCCESS def might_fail(func, default=None): try: return func(*args) except EnvironmentError: return default def session_pids(): my_pid = str(os.getpid()) get_proc = lambda pid, name: open("/proc/%s/%s" % (pid, name)).read() my_session = get_proc(my_pid,"sessionid") return ( int(pid) for pid in os.listdir("/proc") if pid[0] >= '0' and pid[0] <= '9' and pid != my_pid if might_fail(lamnda: get_proc(pid, "sessionid")) == my_session if not "Z (zombie)" in might_fail(lambda: get_proc(pid, "status"), "")) def kill_all(sig): for pid in session_pids(): might_fail(lambda: os.kill(pid, sig)) def pam_sm_close_session(pamh, flags, argv): kill_all(signal.SIGTERM) for i in range(50): if not any(session_pids()): return pamh.PAM_SUCCESS time.sleep(0.1) kill_all(signal.SIGKILL) return pamh.PAM_SUCCESS

Job done! Well maybe. Comparing sessions works fine for ssh, but GNOME creates many of them now. To fix that the easiest kludge would be to kill all the user's processes when all of his sessions are gone. It would be hacky and racey - but this is just a kludge until the real bug fix comes in. It's a pity that's exactly what the real bug fix does.

Distributors ponder a systemd change

Posted Jun 9, 2016 16:31 UTC (Thu) by drag (guest, #31333) [Link] (1 responses)

> Sure, it doesn't handle the emacs scenario you mentioned, but that seems to be pretty esoteric.

These sorts of command-control programs are becoming more common. Tmux is a example. Emacs is a example. But there are others. Gnome-terminal uses a terminal daemon to manage things. Urxvt has the ability to use Urxvtd to as well. With X they are limited to a particular login, but is that same limitation going to exist for Wayland? Is it going to be possible to run the program independent of the display and connect to it?

Then there is things like pulseaudio, mpd (music player daemon), irc bots, irc clients, IM bots, email weirdness, etc etc. These are programs you launch, you leave them floating around, and then connect to using a different process. In the future you'll start running into more AI stuff like Mycroft. Were you have 'helper' programs that the user interacts with, open source versions of Siri, 'hello google' or whatever.

They would all work just a bit better if they were tied to a user being logged into a machine, but not to a specific login.

What I think is going to happen is that we are running into a 'long tail' situation. Each of these things is esoteric, but there are a whole of people wanting to do their own esoteric thing.

At this point I really don't know. I'll have to play around with it.

Distributors ponder a systemd change

Posted Jun 10, 2016 4:01 UTC (Fri) by ras (subscriber, #33059) [Link]

> With X they are limited to a particular login, but is that same limitation going to exist for Wayland? Is it going to be possible to run the program independent of the display and connect to it?

Lots of good questions. The answer to all of them is probably something along the lines of "session tracking is broken, lets fix it"

It's not like the problem of keeping something around only while there are references to it hasn't been stumbled over, cursed at and solved a million times already. The answer being proffered here is "session tracking is broken, so we're abandoning it".

That aside, rather than solving the issue at hand in a minimally intrusive way (may be by sending a SIGHUP to all processes owned by the user when his login session count drops to 0?), they pair it with "followed by an unconditional SIGKILL" because in their opinion they way we have been doing it for the last 30 years is wrong, and we need to be forced down their enlightened path.

If they had of decoupled the two, they probably would have got the first one through without much fuss and if the second one was a good idea it would have become the default in due course.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 3:33 UTC (Thu) by ewen (subscriber, #4772) [Link] (22 responses)

The clearest description of where this went wrong that I've seen is this comment on the Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=825394#221

which points out that sending SIGHUP at "session termination" time would have been the compatible thing to do. screen/tmux/nohup, etc, all know how to ignore SIGHUP, and SIGHUP is precisely intended as a "end of user session" indicator, ie, the controlling terminal has gone away. (Now we don't have controlling terminals that much, but we have a more sophisticated idea of "session" -- so "session has gone away" makes more sense as the meaning of SIGHUP.)

The choice to send SIGTERM (ie user initiated termination) rather than SIGHUP (external action initiated, ie session gone) -- and particularly to default to following that up with SIGKILL -- seems to be the root cause of the pain experienced. By contrast, turning on a default of sending SIGHUP at the "end of session", when that's a "GUI session" without a controlling terminal, seems fairly likely to produce the right, backwards compatible, results (since all the "intended to stay running" programs know how to handle SIGHUP, and have for decades; and all the "intended just during login session" do something useful on SIGHUP, even if it's just the default behaviour of exiting).

FWIW, it does seem reasonable to have a "sysadmin enabled, off by default" session manager policy option to also send SIGTERM/SIGKILL at "end of session" if the site policy is "no persistent processes at all". But I don't think that's the common case at all. Particularly for what seems to be the original cause for the change (ie, login session processes persisting "too long" because they didn't get HUP'd on last logout due to not having a controlling terminal).

Ewen

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 5:39 UTC (Thu) by matthias (subscriber, #94967) [Link] (21 responses)

Problem is that SIGHUP semantics do not always work. There are three sensible reactions to SIGHUP:
1. do nothing. kernel will clean up
2. intercept because process wants to survive
3. intercept to do a clean exit (save some data, etc.)

If 3 wents wrong, the process survives. A cleaner implementation would have been to sent a SIGHUP followed by some SIGHUP2 with the meaning that SIGHUP2 is only intercepted by daemons, i.e., with the SIGHUP a process does a clean exit and SIGHUP2 will terminate all non-daemons that failed for whatever reason to exit. Unfortunately it will be hard to change this semantics.

Ontop we have the problem of processes ignoring SIGHUP for other reasons, as SIGHUP get also send without the login session ending (e.g., closure of X terminals). After all the semantics of SIGHUP have changed in time.

For most of the processes the correct choice is they should not survive the login session. The old behaviour is not really working this way, as SIGHUP is not successful in terminating all these processes. Having the session manager do a SIGTERM/SIGKILL at the end of the session is reasonable. However it needs to know which processes should survive. Therefore we need small changes to very few applications.

- We need some version of nohup that also tells systemd to not kill the process (systemd-run should work, users need to get used to this. Of course one could install a version of nohup that takes care of this)
- programs that start some sort of long-living sessions (e.g., screen, tmux) should really start sessions on their own. Starting a PAM session for screen has also the advantage that the session management can take care of not terminating processes like ssh-agent while they are still needed for the program inside the screen.

This way, session management would be much cleaner independent of a no persisting processes policy. Such a policy would then be implemented by not allowing lingering processes and not allowing access to cron, at, batch (and possibly screen,tmux). With screen, tmux employing propper session management, a user using these programs would still be listed as logged in. So it is harder to hide some processes. Of cousre depending on policy someone might want to restrict the access to these programs, too.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 6:00 UTC (Thu) by ras (subscriber, #33059) [Link] (14 responses)

> If 3 wents wrong, the process survives.

True. But if you're doing cleanup on exit, your also doing it for a SIGTERM at the very least. It will be identical code and if a SIGHUP freezes it's very likely a SIGTERM will do the same thing.

> The old behaviour is not really working this way, as SIGHUP is not successful in terminating all these processes.

Then there is a bug. In the situation this change is trying to address is a bug that was introduced deliberately by Gnome / Systemd. They wanted to see some user services (eg, address book), between login sessions. This means when the user logged out, it had to ignore SIGHUP.

I've seen several versions of the address book service running on my own laptop, so I've been hit by it myself. I can't say felt a huge impact beyond thinking "gee, that's untidy". It certainly was not worthy of anything more than raising the energy to file a bug report. I confess it seemed so minor I didn't even bother doing that.

> - We need some version of nohup that also tells systemd to not kill the
process (systemd-run should work, users need to get used to this).

Yep, we have it. It's called signal(SIGHUP, SIG_IGN). Then the process won't die, and no one has to learn anything. It ain't rocket science. If you want to enforce a policy of all processes being forced to exit on logout, add a PAM plugin. But most people won't use it because it's not a serious issue on a headless server or a personal laptop, and Gnome will have to find some other way of fixing their bug.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 6:24 UTC (Thu) by matthias (subscriber, #94967) [Link] (13 responses)

>> If 3 wents wrong, the process survives.
> True. But if you're doing cleanup on exit, your also doing it for a SIGTERM at the very least. It will be identical code and if a SIGHUP freezes it's very likely a SIGTERM will do the same thing.

Therefore systemd sends a SIGKILL after some time. This is meant to bring down the processes where SIGTERM did not work. This is the same mechanics that shutdown has used since decades.

>> The old behaviour is not really working this way, as SIGHUP is not successful in terminating all these processes.
> Then there is a bug.

Obviously, but it is not only a bug of gnome. I have seen this bug on KDE many years before systemd even existed. Of course it is nice to fix the bugs, but it is also obvious that there always will be some bugs.

>> - We need some version of nohup that also tells systemd to not kill the process (systemd-run should work, users need to get used to this).
> Yep, we have it. It's called signal(SIGHUP, SIG_IGN).

Unfortunately, SIGHUP is also sent in some situations when the login session is not terminating. So there is software ignoring SIGHUP for other reasons as that the process should survive the session. Also every software with a bug in the SIGHUP signal handler could be a problem. From my experience, problems with the SIGHUP handler are the usual reason for processes lingering around that should have exited.

> But most people won't use it because it's not a serious issue on a headless server or a personal laptop, ...
I expect distros to accept the change, once the few problematic programs have fixes. Most users will not change it back, once screen and tmux work and the manual says use systemd-run for background processes instead of nohup, as they will not encounter any problems.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 7:08 UTC (Thu) by ras (subscriber, #33059) [Link] (12 responses)

>> SIGTERM at the very least. It will be identical code and if a SIGHUP freezes it's very likely a SIGTERM will do the same thing.
>
> Therefore systemd sends a SIGKILL after some time. This is meant to bring down the processes where SIGTERM did not work. This is the same mechanics that shutdown has used since decades.

I think you missed the point. The point is there is a bug in the SIGHUP handling, there is also most likely a bug in the SIGTERM handling. Sending a SIGKILL does not fix the problem. It hides it. Assuming the application is trapping both of these for a reason such as saving data, the fixing the bug is the correct path - not hiding it.

That said, it's been a long while since I've seen either problem. Until now, when GNOME introduced it as a "feature".

> Obviously, but it is not only a bug of gnome. I have seen this bug on KDE many years before systemd even existed.

Yep, and it was fixed by KDE long long ago. The difference GNOME / Systemd doesn't consider what they have done to be a bug - it's a new feature. Then to fix the bugs their new feature introduced they want to breaks backward compatibility with systems that don't use GNOME. KDE had the decency to fixed their bugs without using it as an excuse to inflict their version on Utopia on everyone else.

> Unfortunately, SIGHUP is also sent in some situations when the login session is not terminating.

Only when it's been co-opted for other purposes - like reloading the configuration in system daemons. And if the particular program does either they are uninterested in knowing when the user logged out, or they have introduced a bug because there is no other way to know short of polling. This won't change under the proposed new regime, as SIGHUP will remain the way a process learns the login session has ended.

> I expect distros to accept the change, once the few problematic programs have fixes.

We will see. As the article points out, the tmux people don't see their program as the problematic one in this case, and from what I can tell a fairly large cohort of people agree with them.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 9:09 UTC (Thu) by matthias (subscriber, #94967) [Link] (11 responses)

>> Therefore systemd sends a SIGKILL after some time. This is meant to bring down the processes where SIGTERM did not work. This is the same mechanics that shutdown has used since decades.
>I think you missed the point. The point is there is a bug in the SIGHUP handling, there is also most likely a bug in the SIGTERM handling. Sending a SIGKILL does not fix the problem. It hides it. Assuming the application is trapping both of these for a reason such as saving data, the fixing the bug is the correct path - not hiding it.

I am arguing that session shutdown is like system shutdown. Since decades we use SIGTERM/SIGKILL when shutting down the system. Would you argue that when I type shutdown -r now and some application is not terminating cleanly, then the system should hang forever because sending a SIGKILL after SIGTERM is hiding bugs?

I fully agree that bugs should be fixed, but on the other hand some fundamental things as session management should handle bugs of applications gracefully.

>> Unfortunately, SIGHUP is also sent in some situations when the login session is not terminating.
> Only when it's been co-opted for other purposes - like reloading the configuration in system daemons.
Or a pty going away because an X terminal is closed. Not every X terminal is a session on its own. Semantics have changed in time.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 10, 2016 2:49 UTC (Fri) by ras (subscriber, #33059) [Link] (10 responses)

> I am arguing that session shutdown is like system shutdown.

Yes, I knew that and should have addressed it, I guess. But it did sound to me like "I use a hammer to crack nuts, so why not an egg?"

The way I see it is: a process hangs around after logout when it's shouldn't, about the only harm done is a little lost RAM, or at worst a pinned CPU if it's gone infinite. If that happens and it bothers you, the fix is also simple: kill it. On the other and automatically killing a process when it hasn't shutdown properly delays getting the bug fixed. And there is a bug that needs to be fixed: either it doesn't matter in which case why is trapping SIGHUP at all, or it does matter and tears will follow one day.

If a process doesn't stop on shutdown the implications are much more severe. I've lost control of remote servers because of it. Plane flights cost time and money. It's not that the consequences of killing the process isn't the same: both result in loss of information. It's that the consequences of not shutdown not happening is very different.

> Or a pty going away because an X terminal is closed. Not every X terminal is a session on its own. Semantics have changed in time.

Yes they have. The session id the kernel used has been co-opted for all sorts of purposes now. This is the real problem you are grappling with. We used all sorts of kludges to get around it, but apparently these GNOME changes were the straw that broke the camels back. It seems session tracking is now far too hard, so rather than track sessions you've decided killing all processes belonging to a user is the way to go.

Obviously it's a kludge. It's racy (what if a person is logs between the 0->1 test and you lot starting to kill processes), and it won't always work (what about processes started as a different user), and it isn't backward compatible with what worked for 30 years now.

I'd have more sympathy if you were trying to get something simple done and stumbled onto this mess. Instead, you lot with your multiple systemd process trees are responsible for the worst aspects of it. And all this so you can optimise multiple GNOME sessions on the one machine. Does that even happen?

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 10, 2016 3:23 UTC (Fri) by pizza (subscriber, #46) [Link]

> And all this so you can optimise multiple GNOME sessions on the one machine. Does that even happen?

Yes.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 10, 2016 8:55 UTC (Fri) by matthias (subscriber, #94967) [Link] (8 responses)

Ok, I agree that the consequences in case of a shutdown/reboot are usually more severe. I was just bit by this during my studies, as we constantly had some PCs in the pool with some processes that instead of exiting cleanly started using 100% CPU. And yes, these were bugs not students to help SETI with university computing power. Killing was often not possible as the admins had different working times than the students and obviously a normal user cannot kill processes of different users. The systemd KillUserProcesses would have been very welcome.

> Obviously it's a kludge. It's racy (what if a person is logs between the 0->1 test and you lot starting to kill processes),

In contrast to solutions with pkill, systemd should only kill processes belonging to closed sessions (and not of the new session), as it tracks sessions with cgroups. There might be a race condition if the new session decides to use some process of an old session which gets killed. I am not sure whether systemd removes this race by delaying the start of the new session while on a killing spree. This should be possible.

> and it won't always work (what about processes started as a different user),
I just tested this starting some process with su as a different user (I temporarily added my test user to the wheel group). The process was terminated, because it was in the same cgroup. This case should not be that important anyway, as normal users are not allowed to start processes as different users.

> and it isn't backward compatible with what worked for 30 years now.
I agree, but the programs that need changes are very few. For most cases, background processes are started as daemons anyway. I always see screen and tmux mentioned and their session management is broken anyway. Helper programs like ssh-agent get terminated when the user logouts, even when they are needed inside the screen. Registering a session with PAM would be cleaner anyway.

Obviously this change should only hit stable distributions, once screen and tmux are fixed (either upstream or by distribution patches).

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 0:30 UTC (Sat) by ras (subscriber, #33059) [Link] (7 responses)

> The systemd KillUserProcesses would have been very welcome.

Yes, I can well imagine it would be.

But you did have another option: http://lwn.net/Articles/690555/

Well maybe not that precisely, but with a few tweaks you could have made it kill all processes owned by a student when his last session was gone, and unlike the current proposal made it very selective so only the students were effected and not sysadmins or others squawking here. This is exactly the sort of problem PAM's session management is well suited to.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 2:06 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

So why is it any better than a setting in user config file?

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 8:42 UTC (Sat) by ras (subscriber, #33059) [Link] (5 responses)

> So why is it any better than a setting in user config file?

It was just a work-around, and I don't doubt there are many people who think getting upstream to provide a config option for their particular problem is a better solution. Maybe you are one of them. I'm not.

There is a 50 line solution to the problem. It isn't a patch to upstream I have to carry, the API is stable, a compile isn't required, it doesn't require me to monitor upstream security problems and rebuild it with every fix - it's just drop a file into a directory and go. If I was the sysadmin being given grief by miscreant students, I know I would have invested the hour needed to write it. If as claimed there are lot of other sysadmin's with the same problem, I am somewhat puzzled that it isn't packaged and available on the major distro's already, because if it had been it would have been just a setting in PAM's config file.

Which brings us to the real point. I don't use Linux because it has a setting in a config file for my every need - that's an impossible ask after all. (If I believed it was possible, I would be using Windows. Obviously it's not there yet, but given it's possible it must be just around the corner ...) I use 'nix because it's swiss army knife that is so flexible, in for most problems there is a 50 line solution.

The KillUserProcesses setting looks nothing like that. Elsewhere you said it can be controlled per user. What if I don't think per user particularly useful? Maybe I'm a sysadmin with miscreant student population in a large educational institution that turns over staff regularly, and with every change of staff I have to change the systemd configuration on 100's of machines. I don't think so. Give me a system that provides the flexibility to configure in a way that suits me. Maybe I put all students in the one group, or maybe I lookup payroll, or read a flag out of FreeIPA.

I'd take that over flexibility over a specialised "config option" any day. Quite apart from anything else, I could not be as productive in my profession life without it.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 11:25 UTC (Sat) by pizza (subscriber, #46) [Link] (4 responses)

> I'd take that over flexibility over a specialised "config option" any day. Quite apart from anything else, I could not be as productive in my profession life without it.

Then it's a good thing that you're not forced to choose between those two, eh?

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 11:55 UTC (Sat) by ras (subscriber, #33059) [Link] (3 responses)

> Then it's a good thing that you're not forced to choose between those two, eh?

If we could leave it turned off with no repercussions other than our tmux sessions continue to run, there wouldn't be almost 300 posts on LWN about this. The reality is, if we want GNOME to clean up properly, we have to enable KillUserProcesses. Frankly I'd even accept that, albeit for purely selfish reasons as I'm not a fan of GNOME 3. Unfortunately many of the other window managers rely on GNOME to fill the gaps in their own efforts, including the one I use on my laptop.

This doesn't feel like we are being offered a choice.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 14:55 UTC (Sat) by pizza (subscriber, #46) [Link]

> If we could leave it turned off with no repercussions other than our tmux sessions continue to run, there wouldn't be almost 300 posts on LWN about this.

You are, in a word, incorrect.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 12, 2016 10:20 UTC (Sun) by micka (subscriber, #38720) [Link] (1 responses)

> there wouldn't be almost 300 posts on LWN about this

Slowly reading through them. From what I've read up until now two thirds are from 3 or 4 persons. I'm not sure what you can deduce from the number of comments except that there are very talkative commenters.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 17, 2016 16:01 UTC (Fri) by Wol (subscriber, #4433) [Link]

:-)

Certain topics press certain buttons. Gnome brings out one set of posters.Systemd brings out another (and I've noticed systemd tends to attract troll accounts I've never seen before ...)

And databases? Well that tends to get me going :-) It's all about what matters to people. And some people just enjoy sitting in the peanut gallery lobbing rotten tomatoes ... :-)

Cheers,
Wol

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 6:47 UTC (Thu) by ewen (subscriber, #4772) [Link] (5 responses)

Your case 1 is a normal process, and the normal kernel termination kicks in as planned. Everyone seems happy with this case.

Your case 2 is either (a) "long running daemon", which these days are typically launched by some sort of "init" process (directly or indirectly) so have their own session (and thus okay) or (b) is a "long running user process" (screen, tmux, nohup background process, etc) which are detached from the controlling terminal and (in systemd land) have a "user session" . In both cases (apart from any "site policy") it's intended, by the user/process that started them, that they survive. (And as someone else suggests, PAM seems a good place to put such "all processes must go away on user GUI logout" site policy.)

Your case 3 is a "smarter" background process that wants to, eg, save state and *then* exit. In which case on receiving SIGHUP it should exit very soon afterwards, no problem. If there's a bug that prevents it from getting to exiting in a timely fashion... then that's a bug and should be fixed in the program. Having a "nuke it from orbit, it's the only way to be sure" (SIGTERM/SIGKILL) approach that affects all processes "just in case" of bugs in the occassional program seems... excessive.

As you allude to there is an issue with the "background process that wants to exit cleanly" if they are currently attempting to use SIGHUP for something else (eg, "reread config"). The obvious solution to that problem is to hook their "reread config" option up to another signal -- SIGUSR1 is common. ("SIGHUP to reread config" makes sense as a convention only for running daemons, because it's a "soft exit" -- ie, act as if you exited and started again loading the new config, but without actually exiting. But even there it's at best a kludge. Just a really long standing convention. SIGUSR1 is another common convention for "reread config" or "soft restart", also used for decades.)

It still seems to me there's no need to force every long running program to be rewritten to be "systemd/session" aware, since there's a long standing convention (SIGHUP when the session is going away) that can be used again here and all the relevant programs already understand that so it would be elegantly backwards compatible.

Ewen

PS: If one were starting again from scratch, without 30+ years of history, it's arguable that having screen/tmux/nohup being "session aware" makes sense. But in historical terms they are: they start new (1990s) sessions by detaching from the controlling terminal, which is how "sessions" worked for decades (see, eg, Stevens "Advanced Programming in the Unix Environment"). The semantics/needs haven't changed in the last 30 years, only what indicates "the session" and "end of session".

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 8:52 UTC (Thu) by matthias (subscriber, #94967) [Link] (4 responses)

case 3 is not necessarily a background process. It is every foreground process that needs to save data before exiting. You have to intercept SIGHUP as otherwise you cannot do anything but the process just dies. We should not rely on the fact that every program is correct.

The problem is that there are many cases, where a program has to intercept SIGHUP despite the program does not want to survive the session.

I agree fully that not every long running program should be systemd/session aware. The overwhelming majority of daemons just work, as they are started as daemons. If a user wants a special program to survive, the user should use systemd-run instead of nohup in a systemd setting. No need to change the program. The only programs needing changes are programs starting new sessions themselves (screen/tmux if you have other examples I am still waiting to here of them). These program should register a PAM session (not only for not being killed but also for proper management of associated processes like ssh-agent).

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 9:22 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> the user should use systemd-run instead of nohup
Why? Just wrap/replace nohup with a script that simply invokes systemd-run.

Now, I'm quite partial to doing ctrl-Z/bg/disown dance to background unexpectedly long running jobs. E.g. I've started a build and it suddenly decided to download a new version of the docker image. Very slowly.

What would be the best equivalent in this case?

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 16:10 UTC (Thu) by cortana (subscriber, #24596) [Link] (1 responses)

Probably some kind of command that moves the disowned process (BTW, what's the difference between SIGSTOP, bg & exiting the shell, and SIGSTOP, bg, disown and exiting?) into its own scope unit. So a bit like 'systemd-run --scope --user' except with an existing process, rather than a new process that the user's systemd instance launched.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 16:50 UTC (Thu) by viro (subscriber, #7872) [Link]

disown doesn't do anything to the process; it's a shell builtin that makes shell forget about that job, so that when it comes to shell-sent SIGHUP the job in question won't be affected.

The point is that back when you had launched that sucker you had no idea that it might need to be left to run - otherwise you would've used nohup to start it in the first place. And yes, there's a bunch of real-world situations where you really don't want to kill the damn thing and restart it from scratch, this time with nohup. Consider a case when what you expected to be a couple of hours of calculations you've started in ssh session on a big fast box at 1pm, only to discover at 11pm, when you get around to checking what it has produced that it's only ~2/3 way through. And you really have to disconnect the laptop you'd been using and leave. Killing that sucker and starting it with nohup means that results won't be there until 2pm tomorrow instead of waiting for you when you get back there in the morning. Sure, you ought to have added checkpointing, etc., but the whole problem is that it was supposed to be a one-off thing, and a reasonably quick one. I've no idea whether that's the scenario original poster had in mind, but it definitely does happen. disown can save you a lot of PITA in such case.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 9:33 UTC (Thu) by ewen (subscriber, #4772) [Link]

If a foreground process is "resisting exiting" (ie, catching SIGHUP and doing something first), then (a) (almost by definition) that means that there is something user-visible which is obviously not exiting, (b) it is quite likely because the foreground process is trying to ask the user a question ("should I save your file first?") or similar and (c) thus probably *should* cause the exit of the session (eg, GUI desktop logout) to be deferred until a suitable answer can be achieved and state saved.

In the unlikely event that the *foreground* process still isn't behaving properly, and is blocking the session exit as a result, most desktops provide the user with tools to... persuade the process to exit. ("Force Quit" and the like -- which can do the SIGTERM/SIGKILL dance, at the user's explicit request, for the single buggy *foreground* process.)

AFAICT foreground processes were already doing the right thing; the change seems to have been caused solely by background "session-wide" processes (user dbus and the like) and the changes in how they worked.

FWIW, as I said earlier programs like screen/tmux/etc *are* starting new sessions, as that's been historically defined (by detaching from the controlling terminal, calling daemon(), etc). The discussion (in the tmux issue) about putting this "one init/session manager" specific behaviour in a commonly used place (eg, daemon()), so a simple recompile/relink picks it up seems more appropriate than requiring each "working for years" tool to suddenly have to add special code just to avoid being killed in one context. (Even PAM isn't necessarily used on all platforms that tools like screen, tmux, etc, use.)

And IMHO, breaking nohup and then saying "well you should use this other tool (systemd-run) instead just on systems that have this init system" is also unfortunate. It'd probably be better for distributions to install a nohup that continued to "do the right thing" so the background process was allowed to continue to run, keeping the long standing (25+ years I know about) semantics of "nohup".

Ewen

What breakage does this actually fix?

Posted Jun 8, 2016 5:35 UTC (Wed) by iamsrp (subscriber, #84011) [Link] (16 responses)

Is there actually a non-hypothetical and specific problem which this is solving? If so, does solving that problem outweigh the resultant breakage? I've spent half an hour trying to answer the first question here, with not results for my efforts. Perhaps I'm just not very bright this evening,..

srp

What breakage does this actually fix?

Posted Jun 8, 2016 5:59 UTC (Wed) by marcH (subscriber, #57642) [Link] (2 responses)

Lennart got tired to (re-)configure his grand-ma and cousin's PC.

The Linux desktop is coming!

What breakage does this actually fix?

Posted Jun 8, 2016 7:25 UTC (Wed) by SimonO (guest, #56318) [Link] (1 responses)

On the (single user) desktop, I don't see much of a problem with this idea.

On servers, cluster nodes in a scientific environment, shared desktops, etc. I'd expect this to be disabled by default. With Linux that is the majority of use-cases, as single user linux-desktop is a small niche of the whole spectrum I guess...

The whole thing appears to be in the hands of distributors. Systemd is a raw tool (like a swiss army knife with a BFG 9000 included ;-) and they need to expose only the safe parts for normal use and include some safety precautions for the BFG.

If all distro's have to put in work to make every new systemd release palatable it should become an upstream problem to reduce the redundant work in all distro's.

What breakage does this actually fix?

Posted Jun 8, 2016 8:34 UTC (Wed) by ovitters (guest, #27950) [Link]

> If all distro's have to put in work to make every new systemd release palatable

It's not that much work to integrate new systemd releases. It often aligns behaviour and reduces differences across distribution, making things easier. It is usually the differences causing the interesting bugs. Theoretically you should be able to modify anything you want, but it does lead to bugs.

What breakage does this actually fix?

Posted Jun 8, 2016 6:36 UTC (Wed) by oldnpastit (subscriber, #95303) [Link] (2 responses)

Somebody who knows a lot more about systemd than I found this reason:

> The commit that made the change:
> https://github.com/systemd/systemd/commit/97e5530cf20
> referenced two bugreports:
>
> https://bugs.freedesktop.org/show_bug.cgi?id=94508
> https://github.com/systemd/systemd/issues/2900

And if you go look at those bugreports, the first of them is complaining that latest dbus leaves processes lying around.

And if you go investigate why dbus is now leaving processes around, it's because gnome leaves special magical gnome per-user processes running after the user's session has terminated. And that in turn is relying on some new systemd features to create a user session which persists (or something, I don't really understand systemd).

What breakage does this actually fix?

Posted Jun 8, 2016 18:56 UTC (Wed) by drag (guest, #31333) [Link]

> (or something, I don't really understand systemd).

Remember how it was common in Gnome 2-land to have a session manager process? So you could go into there and say 'I want FOO started when I log in'?. People would use it for all sorts of fun things. Starting up browsers, launching their terminals, blocking daemons they didn't want launched.

Then along came XDG and the various *.desktop things to make app menus cross-platform. So you could put *.desktop files in ~/.config/autostart/ and have them autostart when you logged into X (provided you used a desktop environment that was compliant).

Systemd wants to do that same sort of thing for your user, but have it be system-wide. That is one of the reasons they want Kdbus.. so you wouldn't have to run all the dbus-launch stuff for each login. You could have a dbus for a user account and have it 'just work'. Thus you have 'user sessions'.

So a lot of the commands you use for managing the init system for your system can be used to manage your user session for your user by using the '--user' option.

So for example I have a ~/.config/systemd/user/synergys.service on my desktop. When I start up my laptop I have a corresponding synergyc service that launches and they try to talk over a ssh session. I can automate the management of all of this through systemd and '--user'.

systemctl enable --user synergys

systemctl start --user synergys

systemctl status --user synergys

journalctl --user -f

etc etc. These things work as expected, but just for your user account.

What breakage does this actually fix?

Posted Jun 9, 2016 16:06 UTC (Thu) by cortana (subscriber, #24596) [Link]

Thanks for highlighting that. The freedesktop bug is really illuminating, and I wish that people would read it, in full, and understand it before they fly off the handle blaming 'GNOME'... sigh!

What breakage does this actually fix?

Posted Jun 8, 2016 8:49 UTC (Wed) by matthias (subscriber, #94967) [Link] (8 responses)

Which breakage? I expect the obvious programs (nohup, screen, tmux) to be fixed (either upstream or downstream) before this hits stable distributions.

I have seen processes consuming 100% CPU just because shutdown did not work (the SIGHUP handler was there but failed to terminate the process). Having the session management kill such processes seems reasonable. Therefore it has to know which processes should survive.

What breakage does this actually fix?

Posted Jun 8, 2016 10:25 UTC (Wed) by NAR (subscriber, #1313) [Link] (7 responses)

In what usecase is it an actual problem? In a single user desktop/laptop setting I presume you shutdown the system after logout (why else would you logout?), so the process gets killed anyway. In a multiuser server setting the system administrator will notice that there's a rogue process using 100% CPU and will kill that process (preferably after contacting the user and ensuring that the process in questions is rogue indeed and something actually making overnight calculation or something like that). The system administrator has to watch out of these kind of processes even if users are logged in anyway. I do think it's the small minority case where this feature could be useful. As far as I understood the article, 3 distributions out of the 3 that made a decision on this feature, disabled the feature, so the default should be the disabling in upstream too.

As to the technical choice that every program who wants to avoid getting killed after logout has to link against libsystemd - throwing out binary compatibility again. They could have provided some kind of wrapper, so screen could be aliased to something like "wrapper screen".

What breakage does this actually fix?

Posted Jun 8, 2016 11:21 UTC (Wed) by matthias (subscriber, #94967) [Link] (1 responses)

I had this problem in university pools. While a user is logged in, there is no problem. When the user notices that the desktop is too slow, the user can kill his own process. The problem occurs, when such a process is left after logout. The behaviour of killing all processes that are not ment to survive seems very reasonable.

And asking that the administrator should do the job of session management seems like a joke to me. It is enough work when the administrator has to react to a user doing malicious things on purpose.

I expect that the usual tools for having background processes are changed before this change hits stable distributions. Before that, disabling the feature is a reasonable choice. tmux and screen could need proper session management (using PAM) anyway (e.g., to avoid that the ssh-agent gets terminated when the user logs out). The current behaviour looks broken even without the systemd change. PAM will do the necessary things to avoid systemd killing the tmux/screen session in this case. I do not see a problem, when a distribution, which uses systemd as PID 1, needs a version of nohup that is linked against libsystemd.

Once theese changes are in place, the distributions can enable the feature again.

What breakage does this actually fix?

Posted Jun 8, 2016 13:18 UTC (Wed) by NAR (subscriber, #1313) [Link]

I thought those kind of pools went out of fashion long time ago, nowadays university students carry computing devices much stronger than those computers at the pools I used to use... The setting makes some sense in this environment, but I don't think this is the default. Anyway, even in this setting I might prefer to shutdown the computer after logout and boot before login in order to save power (systemd is supposed to make it faster). And this setup does need good monitoring tools to avoid the students running servers there.

What breakage does this actually fix?

Posted Jun 8, 2016 21:36 UTC (Wed) by xtifr (guest, #143) [Link] (4 responses)

> In a single user desktop/laptop setting I presume you shutdown the system after logout (why else would you logout?)

Like a lot of programmers (and other technical people), you're ignoring the very common case of the *family* computer! Dad may log out so the kid can work on a school paper. Doesn't mean that dad wants his emacs "daemon" to die. In fact, he may be relying on it not dying!

What breakage does this actually fix?

Posted Jun 8, 2016 23:50 UTC (Wed) by johannbg (guest, #65743) [Link] (3 responses)

I guess that family dad has never heard of "Fast User Switching" [1] which exist to handle exactly that usecase.

1. https://en.wikipedia.org/wiki/Fast_user_switching

What breakage does this actually fix?

Posted Jun 9, 2016 4:26 UTC (Thu) by xtifr (guest, #143) [Link] (2 responses)

Or maybe dad doesn't *want* to leave *all* his processes running and consuming resources. Maybe his machine simply isn't powerful enough to have five (two parents, three kids) complete desktop sessions all running, but is powerful enough to keep a couple of relatively lightweight emacs daemons or screen sessions running. Maybe he tried fast user switching and saw it bring his machine to its knees.

Not everyone has a screaming, top-of-the-line, latest model machine with all the trimmings. Especially those who are trying to feed three or more children! :)

What breakage does this actually fix?

Posted Jun 9, 2016 10:37 UTC (Thu) by NAR (subscriber, #1313) [Link] (1 responses)

Granted, I have used fast user switching only on Windows, but it didn't seem to slow down the system. There might be some swapping at the user switch, but otherwise the left processes shouldn't use much CPU (unless dad was encoding DVD - in that case the process killing would be a even worse solution).

What breakage does this actually fix?

Posted Jun 9, 2016 21:45 UTC (Thu) by mstone_ (subscriber, #66309) [Link]

"some swapping"

^^^ lol, yeah, the machine's unusable for several minutes every time someone "fast" user switches.

What breakage does this actually fix?

Posted Jun 9, 2016 2:51 UTC (Thu) by kokada (guest, #92849) [Link]

This change actually solves a problem that I have with Gnome for quite a long time (since 3.18? if so, it is already one year old): sometimes when I try to shutdown my system, it simple hangs waiting for some process to finish. I thought this was a systemd bug, however after investigating a little I found it was some Gnome hanging up when asked to finish, and systemd was painfully waiting it to finish.

The workaround was to reduce the time that systemd waits for a process to be killed, however this is a more definitive and less hacky solution, at the cost of changing on how you think about *nix logins.

And yeah, this is a problem with Gnome that should be fixed. However it is the kind of problem that is intermitent, does not occur with everyone, and come back after a while (I didn't had this problem in some point update of Gnome 3.18, got back in 3.20). And I remember to had this problem in KDE too, even before systemd existed. So yeah, an old and annoying problem.

Distributors ponder a systemd change

Posted Jun 8, 2016 5:56 UTC (Wed) by marcH (subscriber, #57642) [Link]

14 years old openssh bug. Fixed?
sshd orphans processes when no pty allocated https://bugzilla.mindrot.org/show_bug.cgi?id=396

Distributors ponder a systemd change

Posted Jun 8, 2016 7:44 UTC (Wed) by jaromil (guest, #97970) [Link] (5 responses)

> For all the fuss, one might well argue that the development community is working as it should.

If we include and respect the forks and the freedom some of us have taken to opt out (https://lwn.net/Articles/685521/) then yes, I agree with you. But please keep such reasonable and calm tones also when you evaluate disagreement in action. Hooligans do not help anyone really.

Many of us do not trust how systemd reasons and takes decisions, for us this new tmux story is just the tip of an iceberg and, as professionals, we cannot double up our work at the discretion of such disruptive upstream changes and polarized visions of how GNU/Linux systems should work.

So, thanks for this fine round-up, may the dust in your camp settle and good luck trusting the systemd developer team. I simply don't trust them and believe that fixing a default now and then won't solve the root of the problem.

ciao

Distributors ponder a systemd change

Posted Jun 8, 2016 8:22 UTC (Wed) by ovitters (guest, #27950) [Link] (4 responses)

> good luck trusting the systemd developer team

It is humorous that you're using your message to complain about how Devuan is not respect, while above quote is pretty lacking respect IMO. In the last article I pointed out how aggressive the Devuan mailing list is.

> If we include and respect the forks and the freedom some of us have taken to opt out

So you want to be included in on a systemd discussion, while you've opted out of systemd?!?

Distributors ponder a systemd change

Posted Jun 8, 2016 8:35 UTC (Wed) by jaromil (guest, #97970) [Link] (3 responses)

>> good luck trusting the systemd developer team

> It is humorous that you're using your message to complain about how Devuan is not respect, while above quote is pretty lacking respect IMO.

I'm not entirely sure what you are trying to say in English language, however I am not lacking respect. I sincerely wish everyone going the systemd way good luck. Further in my message I state that I do not trust the systemd developers. I'm free to do so and that is not lack of respect, that is basic communication. You are free to not trust me. That's how networks of trust are made, actually.

I take the occasion to state that my message was not directed to Corbet, whom has been very correct in including Devuan in the picture with his past article, for which even he has received immediate and very disrespectful critic in response.
My message is directed to you and all the other hooligans. I hope you understand it, but I'm noticing it may just be inflamatory, so I'll ignore further manipulations in this thread.

>> If we include and respect the forks and the freedom some of us have taken to opt out
>So you want to be included in on a systemd discussion, while you've opted out of systemd?!?

I have the right to be in this forum as much as you do. I have opinions about systemd, I like to debate them and I'm even open to criticism.

Please note that our project, Devuan, has been among the first to struggle to avoid personal attacks and create a platform for civil debate and constructive action in the middle of a very unpleasant escalation of views about systemd. Please help me perceive that systemd is not made of young bullies and a crowd of hooligan supporters. Your approach now does nurture this perception in me.

ciao

Distributors ponder a systemd change

Posted Jun 8, 2016 8:51 UTC (Wed) by jaromil (guest, #97970) [Link]

FTR: you were the one labeling our Devuan dng mailinglist as 'toxic' on the base of very low factual interaction.
You did so while disregarding the answers that official Devuan developers have taken the time to give to your questions, but went quoting answers from other random posters who are not affiliated to the mailinglist.
Reference: https://lwn.net/Articles/685750/
MVG

Please

Posted Jun 8, 2016 13:04 UTC (Wed) by corbet (editor, #1) [Link] (1 responses)

If you are going to talk about respect and avoiding personal attacks, calling others "hooligans" is probably just not the best way to go about it. Can we please keep the name-calling out of the discussion?

Please

Posted Jun 8, 2016 15:48 UTC (Wed) by jaromil (guest, #97970) [Link]

Sure, sorry. In the last messages was indeed my resentment talking.
Please note in my first reply I call 'hooligans' people acting as such on both camps.
ciao

Distributors ponder a systemd change

Posted Jun 8, 2016 8:01 UTC (Wed) by peter-b (guest, #66996) [Link] (7 responses)

This change in default configuration options seems like a really good and long-overdue improvement, and I really don't see what all the fuss is about.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:28 UTC (Wed) by ovitters (guest, #27950) [Link] (6 responses)

They attempted to make the change after ensuring the affected programs (tmux, screen, nohup) are updated. Once those programs are fixed, nobody should notice. Unfortunately tmux bugreport became unreadable with lots of non-developer comments. Initially it seemed they were ok with the patch, then changed their minds. Might misunderstand things.

So current status is that people would notice. Then I can understand. Unfortunately various times maintainers reject anything to do with systemd. Leading to no other solution than to force things. Systemd developers seem to be a bit aggressive to start pushing things, but on the other hand, there's not been too many changes.

This is part of the user sessions, which Lennart talked+blogged about for many years.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:52 UTC (Wed) by paulj (subscriber, #341) [Link] (3 responses)

Why not leave existing interfaces as they are, and make the *new* behaviour the opt-in "requires changes to programmes" behaviour?

It's much easier to update the programmes you know want the new behaviour, than find all the programmes that don't want it. If you miss programmes using the first approach, they just continue with the behaviour they're already running with anyway. For the latter approach, you don't even know if you can find all such programmes - people and organisations have private and internal code.You can't just scan code in free software distro repositories and hope to catch everything.

Distributors ponder a systemd change

Posted Jun 8, 2016 9:02 UTC (Wed) by matthias (subscriber, #94967) [Link] (2 responses)

This will not work. For more than 99% of the processes, there should be the new behaviour: getting killed if normal shutdown fails. E.g. the firefox instance that fails to shutdown when it gets SIGHUP but uses 100% CPU time instead. Problem of the old interface is that a process tells that it does not want to get terminated by intercepting SIGHUP. Unfortunately every nontrivial process (even those that do not want to survive) has to intercept SIGHUP to do a clean shutdown. If this shutdown fails the process is left running.

The old behaviour is only useful to very few programs (I always see screen, tmux, nohup mentioned), which could easily be fixed.

Distributors ponder a systemd change

Posted Jun 8, 2016 14:29 UTC (Wed) by paulj (subscriber, #341) [Link] (1 responses)

So many processes trap shutdown to do cleanup, but cleanup may fail.

Pray tell, how does the systemd change to just kill them outright help with that?

Distributors ponder a systemd change

Posted Jun 8, 2016 15:33 UTC (Wed) by anselm (subscriber, #2796) [Link]

Programs that want to clean up after themselves must already trap SIGTERM (which is the signal that kill(1) and friends send by default). If their cleanup process fails such that they hang or loop rather than exit, or otherwise takes longer than systemd – or for that matter shutdown(8), which follows exactly the same approach as systemd – lets them have, they get bopped on the head with a SIGKILL. This is not exactly breaking new ground.

Distributors ponder a systemd change

Posted Jun 8, 2016 11:20 UTC (Wed) by nowster (subscriber, #67) [Link]

It's not just screen and tmux -- mosh is also affected.

Distributors ponder a systemd change

Posted Jun 8, 2016 22:21 UTC (Wed) by error27 (subscriber, #8346) [Link]

"They attempted to make the change after ensuring the affected programs (tmux, screen, nohup) are updated."

That's not true. They made the change first then tried to fix tmux after everyone got enraged already. Read the tmux RFE from the article and look at the date on it. The tmux RFE was from May 27 and the Debian bug was filed on May 26.

That's the way it should have been done, but it's not the way it happened.

Paper over bugs in other tools

Posted Jun 8, 2016 8:29 UTC (Wed) by NAR (subscriber, #1313) [Link] (2 responses)

is easy to say that such programs should simply be fixed, but, in the real world, sometimes one has to stop playing whack-a-mole and just pave the field instead.

I wish they had the same attitude with pulseaudio and the ALSA bugs, I wish...

Paper over bugs in other tools

Posted Jun 8, 2016 17:42 UTC (Wed) by kyrias (guest, #101770) [Link]

They sort of did, which was why many people in the beginning literally couldn't use pulseaudio because it didn't accommodate the ALSA drivers being bad.

Paper over bugs in other tools

Posted Jun 12, 2016 10:54 UTC (Sun) by micka (subscriber, #38720) [Link]

"They" ? You mean the 5 % (or is it 0,5 %?) developpers both project (systemd and PA) have/had in common at different times?

Multi-user system vs multi-uid system

Posted Jun 8, 2016 8:35 UTC (Wed) by ju3Ceemi (subscriber, #102464) [Link] (2 responses)

Linux is a multiuser system, as it can handle many user at the same time.
However, in the real world, Linux is multi-uid : on my PC, I am the real single one true user.
On many servers, same stuff : you have one real user (say: adminsys).
As the only user on my PC, that PC is mine. When I run a code in background, I expect it to keep running. Why on earth should my PC do anything against me ? He is mine, you shall not do anything against my will.

You don't have many real people behind a single PC, most of the time.

That systemd feature is only "useful" for multiuser systems, but is evil for multi-uid systems.

Multi-user system vs multi-uid system

Posted Jun 8, 2016 22:52 UTC (Wed) by droundy (subscriber, #4559) [Link] (1 responses)

On the contrary, I have on numerous times (and not just in the last few years) had bugs where various GUI programs misbehaved, and when I logged out, or restarted gdm, the processes persisted, such that when I logged back in things weren't working right. This would be fixed with a killall of all my processes from a virtual terminal, or by rebooting the computer. This new default would cause these sorts of issues to always get fixed simply by logging out and logging back in again.

Buggy code doesn't have to be malicious in order to cause problems, and you don't need a multi-user system to benefit from this change. True, you always reboot every time you log out, and maybe that is your habit if you're a Windows user. But I don't have that habit, and prefer for my computer to behave reliably and predictably, where I consider "doing the same thing each time I log in" as an aspect of reliable behavior.

Multi-user system vs multi-uid system

Posted Jun 9, 2016 11:16 UTC (Thu) by HenrikH (subscriber, #31152) [Link]

I have the same exact experience, after the kids have played around with their account and logged out there sometimes can be lots of rough processes running (mostly wine processes).

Distributors ponder a systemd change

Posted Jun 8, 2016 10:00 UTC (Wed) by szbalint (guest, #95343) [Link] (3 responses)

systemd is a social phenomena.

We've had ancient unix plumbing that was really starting to show it's age and while there were some attempts to bring contemporary code into the whole lower layer between the kernel and userland (and system startup), it was not really a coordinated push towards something new. systemd entered that vacuum and pushed hard, solved some problems, made some things convenient to do and won market share with this approach.

The whole problem is that systemd is the php/mysql equivalent of init systems combined with a tendency to borg more and more components into it's sphere of influence. Sure, plently of people use php and mysql too and I don't mean to ignite old flamewars but it's still important to realise that people use software like that _despite_ their technological shortcomings, not because of their technological merits. PHP and MySQL both got bootstrapped into widespread use because they entered a market vacuum, were widely available, easy to start using and were heavily geared towards "keep going" no matter what. The fact that they belong to the "gentleman's C" or "barely adequate" level of software didn't really matter and still doesn't to some extent.

On the macro level I guess it doesn't matter* that much that systemd is making questionable design decisions, it's something that distributions and developers can sort of work around and mitigate for now, people can keep passing --without-stupid-decision-x and their ilk like with OpenSSL and gcc to some extent. What systemd is, is a lost opportunity / opportunity cost. It's not that it worsened the status quo much, it's that we're not building something that's clearly superior to both ancient unix plumbing and systemd, that would actually matter on the large scale and improve things.

*with the caveat that as systemd keeps borging stuff with a willingness to force change on software orthogonal to it and increasing the monolithic complexity, things might start breaking pretty badly

Lennart Poettering is I think the perfect example of someone just smart enough to dig himself into a deep well and with surefire convictions making him incapable of stopping. This current change about killing processes on logout cannot be reasonably justified from a security perspective, killing processes on logout doesn't make things more secure at all. That's a nonexistent security barrier even to the stupidest malicious actor. The fact is that if a user has access to a system, it means control over a huge range of things on that system regardless of whether that user is currently logged in or not. The security barrier is at the point when a user gets deleted, then it makes sense to kill their processes and audit as many resources as the user had access to.

Security is just an excuse in this case, misbehaving gnome processes that do not handle sighup are the real reason for this change and this change is the direct equivalent of MySQL silently inserting '00-00-00 00:00:00' as a date on invalid data, an attempt to brush problems under the carpet where it inevitably leads to more problems, instead of fixing it by rejecting that kind of thing at the source of the problem.

Distributors ponder a systemd change

Posted Jun 8, 2016 11:22 UTC (Wed) by niner (subscriber, #26151) [Link] (2 responses)

If you feel that strongly about systemd's design decisions, feel free to find like minded people and start work on a replacement that's better engineered in your eyes. To stay with your examples: MySQL, despite it's huge success is now losing to PostgreSQL which in my eyes is a superior solution in nearly every regard. PHP is similarly no longer the rising star it used to be. So your own examples contradict your conclusion of lost opportunity.

However, what really will keep a superior solution from appearing, is if you never start building it.

Distributors ponder a systemd change

Posted Jun 8, 2016 11:35 UTC (Wed) by szbalint (guest, #95343) [Link] (1 responses)

How long did it take for Postgres to displace MySQL? That's the opportunity cost.

(sidenote: I don't think the "why don't you do it better, then?" response is reasonable. Recognising that a situation is bad and being able to fix it are two different things.)

Distributors ponder a systemd change

Posted Jun 8, 2016 11:39 UTC (Wed) by niner (subscriber, #26151) [Link]

If you don't know what a better solution would even look like, how do you know that the current one is bad? How can you talk about "questionable design decisions", "borging stuff", "increasing the monolithic complexity", and things that "might start breaking pretty badly" if you don't know how to improve it?

Please don't shaft scientific computing users!

Posted Jun 8, 2016 12:07 UTC (Wed) by rsidd (subscriber, #2582) [Link] (14 responses)

The scientific community were among the earliest adopters of Linux. I got introduced to it in grad school in 1994, and consequently never used any version of Windows (or MacOS) as my primary OS. It is commonplace to leave a job running in the background, via nohup, screen/tmux, or whatever. These are work habits deeply ingrained in finger memory. It is necessary because we often access the same computers both locally (at work) and remotely (from home or while travelling). Also, we all know how to kill unwanted jobs that failed to die. To show the finger to this community that has not only used, but contributed to developing, Linux, for the purposes of chasing after a mythical "desktop" market is very disappointing.

Please don't shaft scientific computing users!

Posted Jun 8, 2016 12:52 UTC (Wed) by pizza (subscriber, #46) [Link] (13 responses)

> The scientific community were among the earliest adopters of Linux. [...] It is commonplace to leave a job running in the background, via nohup, screen/tmux, or whatever.

Exactly -- you don't expect processes to just sit around because you merely backgrounded them; you had to explicitly request this behavior, in advance (ie nohup/screen/etc) or you had no expectation of it remaining active when you logged out.

Heck, this was one of the first things I learned when I was but a wee lad some twenty years ago with my first exposure to Unix (SunOS) in an educational/scientific settings.

What will happen is that these traditional mechanisms will be extended to speak the right incantations to make sure nothing changes from an end-user's perspective -- except that if you don't explicitly request a process linger past logout, it *will* get killed, instead of proceeding in a schrodinger-ish state. Until then distros won't enable this feature by default.

Please don't shaft scientific computing users!

Posted Jun 8, 2016 14:28 UTC (Wed) by rsidd (subscriber, #2582) [Link] (12 responses)

What will happen is that these traditional mechanisms will be extended to speak the right incantations to make sure nothing changes from an end-user's perspective

Looking at the tmux bug report inspires little confidence in this prediction. Nor does the quote from the article "So it may be some time before even the programs that are explicitly intended to run after logout are able to work transparently in this manner." Knowing corbet's style of writing this is likely an understatement.

Until then distros won't enable this feature by default.

Which distro has promised this, so far?

Like it or not (and I know Lennart P. doesn't like it), other unixen do exist, and many of the programs concerned are cross-platform. The most one can hope for is distro-level patching for some of the most popular ones.

Please don't shaft scientific computing users!

Posted Jun 8, 2016 16:06 UTC (Wed) by pizza (subscriber, #46) [Link] (1 responses)

> Looking at the tmux bug report inspires little confidence in this prediction.

Oh, please. A proposal was rejected after determining there was a better way to achieve the desired goals.

> Which distro has promised this, so far?

From TFA, Debian, Arch, and Gentoo?

And, although the official decision is still pending, Fedora will likely follow.

Please don't shaft scientific computing users!

Posted Jun 8, 2016 20:13 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

There was no resolution in that tmux bug.

Please don't shaft scientific computing users!

Posted Jun 8, 2016 18:48 UTC (Wed) by alankila (guest, #47141) [Link]

You can replace tmux with a wrapper that invokes systemd-run whatever stuff to make the magic happen. I predict it's at most 3 lines of shell scripting, and then tmux has been "fixed", whether upstream is receptive to systemd or not.

Please don't shaft scientific computing users!

Posted Jun 10, 2016 14:48 UTC (Fri) by paulj (subscriber, #341) [Link] (8 responses)

What about in-house applications, all the proprietary applications that can exist behind the scenes in companies (I used to work in a multi-national involved in merchant transaction processing; there is a whole _world_ of weird proprietary apps out there)?

This is a fairly major Unix environment break. I'd have thought it should be via opt-in APIs - not opt-out.

Please don't shaft scientific computing users!

Posted Jun 10, 2016 15:29 UTC (Fri) by rsidd (subscriber, #2582) [Link] (7 responses)

>What about in-house applications, all the proprietary applications that can exist behind the scenes in companies

I believe the answer is "screw that, we care about the Gnome desktop".

Please don't shaft scientific computing users!

Posted Jun 10, 2016 15:55 UTC (Fri) by pizza (subscriber, #46) [Link] (6 responses)

> I believe the answer is "screw that, we care about the Gnome desktop".

No, the answer is "no matter how long we wait for folks to voluntarily update their stuff, it will never be long enough."

Meanwhile the rest of the world moves on.

Please don't shaft scientific computing users!

Posted Jun 10, 2016 16:12 UTC (Fri) by rsidd (subscriber, #2582) [Link] (5 responses)

I think that's my answer in different words? At least for those of us who don't see Gnome as progress. For some of us, if it works, don't fix it, unless there are security considerations.

I'd be more sympathetic if Gnome had produced a desktop that was widely regarded as awesome. Instead, in something like 17 years now, Gnome has produced Gnome1 which was a bad knockoff of KDE/CDE; Gnome2 that was a quite useful thing for some people; Gnome3 that threw out Gnome2 in favour of chasing some mythical unicorn. In that time, diehard Unix users moved to Linux and the BSDs, but many, eventually, to OS X. And desktop users of all persuasions (including Windows) became a minority in the face of the mobile onslaught. Where Linux dominates, but in a form (Android) that has little to do with Unix.

If you want to chase the market, just adopt Android on the desktop already. If you want to go after people who like the Unix way, stop pulling stunts like this that make decades-old habits suddenly stop working. Really.

Please don't shaft scientific computing users!

Posted Jun 10, 2016 17:32 UTC (Fri) by pizza (subscriber, #46) [Link] (4 responses)

> In that time, diehard Unix users moved to Linux and the BSDs, but many, eventually, to OS X.

That only demonstrates that said "diehard unix users" actually prioritize "JustWorks" or even "OOooshiny" far higher than "respecting unix conventions".

Please don't shaft scientific computing users!

Posted Jun 10, 2016 19:30 UTC (Fri) by flussence (guest, #85566) [Link] (3 responses)

Most Linux desktops are now in the unenviable position of lacking all three of those qualities, and more.

Please don't shaft scientific computing users!

Posted Jun 10, 2016 20:51 UTC (Fri) by pizza (subscriber, #46) [Link] (2 responses)

> Most Linux desktops are now in the unenviable position of lacking all three of those qualities, and more.

Oh? By any quantifiable quality, things are better now than they've ever been.

As for touchy-feely stuff like "respecting UNIX conventions". Twenty years ago in my academic days, I was taught that the overriding principle for UNIX was the simplicity of implementation. That was prioritized over everything else, including correctness, performance, and ease-of-use (for both developers and end-users)

Oh, for sake of discussion, I'll refer to this list:

http://c2.com/cgi/wiki?UnixDesignPhilosophy

I'll note that even UNIXen at the time that list was published didn't really adhere to UNIX conventions all that well. As does the entire notion of GUIs.

Please don't shaft scientific computing users!

Posted Jun 11, 2016 21:25 UTC (Sat) by flussence (guest, #85566) [Link]

I know it was worded the way it was but I'm not entirely stuck on the idea removing those is bad. Some of the awful acts committed in the mid-00s in the name of “shiny” are best left forgotten (and some forms of “just works” too; nobody'll miss ndiswrapper!)

Unix (the abstract ideal people usually talk about) shouldn't be confused with UNIX (sometimes spelled with an ®). I think most here would see ignoring the latter's conventions as a feature, on the other hand I agree with and often design systems according to that list. There are even a few points in there I didn't know about, but was doing anyway...

Please don't shaft scientific computing users!

Posted Jun 13, 2016 13:17 UTC (Mon) by paulj (subscriber, #341) [Link]

So much better that I switched back to WindowMaker (running under mate-session) the other year. Cause session-management works - least for the stuff I care more about like non-GNOME/gtk apps, older apps, xterms, etc. and - for all its flaws - it at least isn't heave everything up and over on me every 6 months.

If I really wanted to shake up my desktop, I'd probably just go for Android as my desktop, and use some rootless Xserver for whatever older apps I needed. I'd have tried that already, but I can't just 'yum install android-desktop' on my Fedora desktop and laptop.

But what change is it

Posted Jun 8, 2016 12:19 UTC (Wed) by itvirta (guest, #49997) [Link] (1 responses)

I notice that neither the title of this article, nor the short text that shows on the (non-logged-in) front page, seems to
say anything about what kind of a change it is that is being pondered and discussed.

Dear Editor, is this on purpose?
I ask, because it reminds me of the very bad habits I've seen in some mainstream news sources.

But what change is it

Posted Jun 8, 2016 13:06 UTC (Wed) by corbet (editor, #1) [Link]

Not on purpose, but I'm not sure it was a bad thing either? The article explains the situation quickly enough, I think.

The working title was "the systemd process-killing apocalypse," but I decided it needed to be a bit more restrained when the article went out.

For now

Posted Jun 8, 2016 13:52 UTC (Wed) by kh (guest, #19413) [Link] (1 responses)

--without-kill-user-processes flag to restore the default to "no"... The other is to set the KillUserProcesses option to "no" in a distribution-supplied logind.conf file

Which seems fine, unless you suspect they will unilaterally remove those two knobs shortly after the enterprise distros adopt this.

For now

Posted Jun 8, 2016 14:02 UTC (Wed) by pizza (subscriber, #46) [Link]

> Which seems fine, unless you suspect they will unilaterally remove those two knobs shortly after the enterprise distros adopt this.

So, on what are you basing this suspicion? When has systemd ever unilaterally removed something that was part of an external API or configuration option?

There are many things reasonable people can dislike about systemd, but if you're going to claim they're operating in bad faith, you're going to have to back up your assertion with *something*

Distributors ponder a systemd change

Posted Jun 8, 2016 15:35 UTC (Wed) by jberkus (guest, #55561) [Link] (6 responses)

So, while I think the way the change was introduced is fine, I think the decision is bad and the change should be reverted in systemd.

The problem I have with this change ... as with a lot of systemd's foibles ... is that it's motivated by the desktop use-case, primarily Gnome. Thing is, there are a lot more Linux systems running on servers than on desktops, and a lot of the existing Linux desktops don't run Gnome. Yet all Linux users are being asked to change how their applications work because Gnome can't control its processes. That's equivalent to forcing everyone to wear face masks because a few people have bad breath. Get those people some Scope and stop bothering the rest of us.

I don't buy the security argument for two reasons. First, this is a system-wide parameter which doesn't let admins discriminate among users, making it practically useless. Second, if it was really a security issue, it belongs in SELinux, where we can have real policies for persistent sessions instead of this ad-hoc BS. So I believe that the security argument is just a smokescreen for "we can't fix Gnome so here's a big rug to cover the issues."

Distributors ponder a systemd change

Posted Jun 8, 2016 16:41 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link] (2 responses)

> The problem I have with this change ... as with a lot of systemd's foibles ... is that it's motivated by the desktop use-case, primarily Gnome

This is definitely not true. You can find plenty of examples even in the comments thread of programs that don't exit cleanly. I do think, distributions are doing the right thing in disabling this feature for now, it is definitely not a desktop only problem.

>Second, if it was really a security issue, it belongs in SELinux

That might be true if SELinux was more widely adopted. Unfortunately, that isn't the case.

Distributors ponder a systemd change

Posted Jun 8, 2016 17:05 UTC (Wed) by jberkus (guest, #55561) [Link] (1 responses)

> >Second, if it was really a security issue, it belongs in SELinux

> That might be true if SELinux was more widely adopted. Unfortunately, that isn't the case.

That's a "but the light's better over here" argument. Poettering is pushing this because it's "the right thing to do". But the *right* thing to do is for it to be in SELinux, where there can be actual admin policies around process-killing instead of just an on/off switch. So we should either do the expedient thing to do (which is to leave the defaults where they are) or the right thing to do (which is to put this in SELinux with hooks in systemd to support it). This change is neither right, nor expedient.

Distributors ponder a systemd change

Posted Jun 8, 2016 17:12 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link]

> But the *right* thing to do is for it to be in SELinux

I don't see why SELinux is the obviously right place to do it.

> where there can be actual admin policies around process-killing instead of just an on/off switch

It isn't just a switch in systemd. You can have admin policies in polkit.

Distributors ponder a systemd change

Posted Jun 8, 2016 17:11 UTC (Wed) by mjg59 (subscriber, #23239) [Link]

I'm not entirely clear on how SELinux would help here. You can't change the context of a process without its cooperation, and you still don't have any kind of revoke(). How would you restrict the ability of processes to continue to use resources they had access to before the user logged out?

Distributors ponder a systemd change

Posted Jun 8, 2016 20:43 UTC (Wed) by barryascott (subscriber, #80640) [Link]

Google for logind.conf and I see that if configurable.

KillOnlyUsers=, KillExcludeUsers=

These settings take space-separated lists of usernames that override the KillUserProcesses= setting. A user name may be added to KillExcludeUsers= to exclude the processes in the session scopes of that user from being killed even if KillUserProcesses=yes is set. If KillExcludeUsers= is not set, the "root" user is excluded by default. KillExcludeUsers= may be set to an empty value to override this default. If a user is not excluded, KillOnlyUsers= is checked next. If this setting is specified, only the session scopes of those users will be killed. Otherwise, users are subject to the KillUserProcesses=yes setting.

Distributors ponder a systemd change

Posted Jun 9, 2016 4:36 UTC (Thu) by marcH (subscriber, #57642) [Link]

> The problem I have with this change ... as with a lot of systemd's foibles ... is that it's motivated by the desktop use-case, primarily Gnome. Thing is, there are a lot more Linux systems running on servers than on desktops, and a lot of the existing Linux desktops don't run Gnome. Yet all Linux users are being asked to change how their applications work because Gnome can't control its processes. That's equivalent to forcing everyone to wear face masks because a few people have bad breath. Get those people some Scope and stop bothering the rest of us.

Nice one, thank you.

In an ideal world, upstream would come pre-configured to ease the work of the majority of its users. In the real-world... thank God we're spoilt with our choice of Linux distributions and all the hard and ungrateful work they're doing.

Distributors ponder a systemd change

Posted Jun 8, 2016 15:41 UTC (Wed) by ccchips (subscriber, #3222) [Link]

I have a tendency to look at Linux in a schizophrenic way; sometimes I play at low levels, sometimes I pretend I don't know anything about computers and just use the desktop or terminal. I also have all kinds of UNIX books around here, and I have no idea how often I've seen the "nohup" command mentioned in those books. I would bet there are literally *hundreds* of UNIX/Linux books like that.

I am sorry to have to say this, but this change doesn't sit well with me. If it goes through, it should be EXTREMELY EASY for a person who has installed Linux to fix, and the instructions should be written in big bold letters by the distributor.

If you Linux developers don't want to lose users because of frustrating problems caused by one guy, I suggest you remember that some of us are still out here boosting Linux, and often those who try it aren't all that happy when they get frustrated. There are a lot of packages that don't "just work" on Linux, and that ain't good. The distributors had better make sure software continues to work as expected if you don't want things to get worse.

Distributors ponder a systemd change

Posted Jun 8, 2016 16:02 UTC (Wed) by joey (guest, #328) [Link] (1 responses)

Here's an example of a program's SIGHUP handler misbehaving. http://bugs.debin.org/825772
So no, it's not just some broken gnome thing, it's things like alpine that have had decades to get this right and have instead gotten it wrong.

I hope that at least one of screen or tmux gets support for systemd's API, because I'd like to re-enable KillUserProcesses on my servers eventually.

Distributors ponder a systemd change

Posted Jun 8, 2016 16:08 UTC (Wed) by joey (guest, #328) [Link]

(really http://bugs.debian.org/825772 of course)

Distributors ponder a systemd change

Posted Jun 8, 2016 16:05 UTC (Wed) by mezcalero (subscriber, #45103) [Link] (8 responses)

Small technical correction: logind/PID1 actually *do* send SIGHUP to the session processes (in addition to SIGTERM) when trying to terminate them. That's because shells block SIGTERM and only terminate on SIGHUP.

Lennart

Distributors ponder a systemd change

Posted Jun 8, 2016 20:46 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (5 responses)

Would that cause problems if some daemon rereads the configuration file and it had changed in ways that the user want yet expecting to be activated? (For example he wanted to change two configuration files and then restart two services, but his SSH session got terminated midway).

Distributors ponder a systemd change

Posted Jun 9, 2016 0:54 UTC (Thu) by anselm (subscriber, #2796) [Link] (4 responses)

Daemons, by definition, have no business running in user sessions. SIGHUP is used to prod daemons to reread their configuration exactly because they don't have a controlling terminal and are therefore immune against the original use of SIGHUP, namely their session going away – this means that, for a daemon, the signal is available to be used for this.

Distributors ponder a systemd change

Posted Jun 9, 2016 10:04 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (3 responses)

Then why does gpg-agent reread the configuration file if it receives SIGHUP? And that's just the first manpage I opened...

Distributors ponder a systemd change

Posted Jun 9, 2016 11:21 UTC (Thu) by HenrikH (subscriber, #31152) [Link] (2 responses)

Because gpg-agent is a daemon although it's invoked by the user and not by init.

Distributors ponder a systemd change

Posted Jun 9, 2016 11:24 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (1 responses)

Hence the parent assertion that "Daemons, by definition, have no business running in user sessions" is wrong.

Distributors ponder a systemd change

Posted Jun 9, 2016 19:52 UTC (Thu) by HenrikH (subscriber, #31152) [Link]

Well there can always be exceptions to a rule, the most dangerous words are "always" and "never" :-). However something like ssh-agent should be a prime candidate for the systemd socket activation feature, however that might create problems on a multi-user system (I don't know since I have not looked at ssh-agent at all).

Distributors ponder a systemd change

Posted Jun 9, 2016 10:48 UTC (Thu) by diegor (subscriber, #1967) [Link] (1 responses)

I wonder if it would be better to use sighup alone. More or less that is the reasong sighup is exist: "dear process, your user is no more there".

Another question: what does it happens if the process just ignore sigterm?

Distributors ponder a systemd change

Posted Jun 9, 2016 11:22 UTC (Thu) by HenrikH (subscriber, #31152) [Link]

Then it gets killed with SIGKILL

Distributors ponder a systemd change

Posted Jun 8, 2016 16:15 UTC (Wed) by pizza (subscriber, #46) [Link]

> One is to build logind with the --without-kill-user-processes flag to restore the default to "no"; that is what Arch Linux and Gentoo have chosen to do, for example.

Ironically, this creates a functional regression for those that had already chosen to utilize this feature. It makes far more sense to just flip the default back, instead of disabling this feature altogether.

Distributors ponder a systemd change

Posted Jun 8, 2016 18:13 UTC (Wed) by cyperpunks (subscriber, #39406) [Link] (1 responses)

Don't forget the at command, have used it for ages to run large simulation in background after logging out:
$ at -m -f run.sh now

will systemd changes destroy such usage?

Distributors ponder a systemd change

Posted Jun 8, 2016 19:37 UTC (Wed) by dtlin (subscriber, #36537) [Link]

I don't even have at, both in my personal Arch Linux and the CentOS 7 I use for work.
But as long as atd is running, it should be fine. Looks like upstream has a systemd service file for atd too.

Distributors ponder a systemd change

Posted Jun 8, 2016 19:36 UTC (Wed) by flewellyn (subscriber, #5047) [Link] (2 responses)

I can see a use-case for this, definitely, but perhaps the workarounds should be cleaner?

Perhaps a program that lets you start a process in its own control group? Managing control groups to make processes that are explicitly "persistent" by definition?

Distributors ponder a systemd change

Posted Jun 8, 2016 19:56 UTC (Wed) by smcv (subscriber, #53363) [Link] (1 responses)

> Perhaps a program that lets you start a process in its own control group?

That would be systemd-run(1), added in 2013.

Distributors ponder a systemd change

Posted Jun 8, 2016 21:35 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

Ahh, well, in that case, why not just run the desktop processes that were causing problems in a scope that automatically terminates them when the user logs out, but have shell-started processes not be in the same scope?

Distributors ponder a systemd change

Posted Jun 8, 2016 22:31 UTC (Wed) by flussence (guest, #85566) [Link] (5 responses)

I don't entirely understand the full story behind all this, so I'll assume good faith on the part of the systemd developers.

This to me looks like a workaround for Someone Else's badly-engineered software not exiting in a timely manner. I'm not even sure who's to blame for that software, but it must be pretty awful — and widespread — to elicit this kind of nuclear response. It sets an unpleasant precedent in that people writing long-lived processes, who've done nothing wrong, now have an extra non-standard codepath to deal with alongside Android, Windows and OS X.

I thought systemd's whole mission was about pulling up the weeds, not making excuses to keep them there. If the upstream at fault won't cooperate and fix their bugs, why not replace them with code owned by someone less obstinate? That seems objectively better than fracturing the Linux ecosystem to appease them.

Distributors ponder a systemd change

Posted Jun 9, 2016 6:05 UTC (Thu) by matthias (subscriber, #94967) [Link]

Because it is impossible to establish that every software installed on the system is bugfree.

Most software will not need an extra codepath.
- Daemons should be started as daemons. Sytemd will not kill them.
- screen/tmux and the like should register their sessions with PAM. PAM will take care that systemd does not kill. This should be done anyway to manage processes as ssh-agent. Without proper session management they will vanish when the user is logged out even if they are still needed from inside screen.
- Instead of calling nohup you will need to use systemd-run.

I have not seen any other program so far that is affected. I do not claim that there are none, but there seem to be only very few. All the people calling this a big problem have only mentioned these few examples. If this really is a problem, then I would like to see a few more examples.

Distributors ponder a systemd change

Posted Jun 9, 2016 10:11 UTC (Thu) by dunlapg (guest, #57764) [Link]

I found this an interesting summary of how this became a "thing":

http://lwn.net/Articles/690299/

If it's accurate, it's makes it much less justifiable.

Distributors ponder a systemd change

Posted Jun 9, 2016 14:43 UTC (Thu) by ksandstr (guest, #60862) [Link] (2 responses)

The big question is: if this is indeed required by GNOME, KDE, etc. for their ill-behaved spawn, why is it not those desktop environments that set a flag in systemd to have their descendants killed? Why is it the case that screen, tmux, et al. must depend on systemd to avoid that fate?

Per Hanlon's razor, either systemd is again breaking well-working, well-defined userspace (i.e. screen, tmux, nohup'd processes, etc.) just for the dependency-tree influence it yields, or they simply don't know any better than to do a stupid thing and then imperiously double down on it.

Distributors ponder a systemd change

Posted Jun 9, 2016 15:18 UTC (Thu) by johannbg (guest, #65743) [Link] (1 responses)

"well-defined userspace"

Well defined by who?

It's always so that the programs that misbehaving are the ones that need fixing so establish first whether those programs you listed have been technically correctly implemented or are technically misbehaving.

Peoples "feelings", "history", "workflows", "religion" and what not are entirely irrelevant in these types of matter.

So answer this is systemd technically doing the correct thing.

If the answer is yes it's doing the technical right thing then what breaks should be fixed elsewhere.
if the answer is no it's not doing the technical correct thing then what breaks should be fixed in systemd.

It's as simple as that.

Distributors ponder a systemd change

Posted Jun 9, 2016 17:00 UTC (Thu) by ksandstr (guest, #60862) [Link]

> Well defined by who?

By practice across multiple independent implementations of Unix. Call it a silent standard.

Distributors ponder a systemd change

Posted Jun 9, 2016 4:38 UTC (Thu) by jcm (subscriber, #18262) [Link]

It is absolute pure and simple nonsense to break Unix behavior by default. This will have no end of fallout and force admins to add more to their laundry list of "things to fix after install to make Linux usable". Sheer folly.

Rant

Posted Jun 10, 2016 16:42 UTC (Fri) by rsidd (subscriber, #2582) [Link] (8 responses)

People keep talking of the modern Gnome/Unity/whatever desktop as "progress", saying remember what it used to be like to use a Linux desktop/laptop, everything "just works" now, therefore just roll over when we nuke the old Unix ways.

This is an utter myth.

Yes, the modern Linux desktop "just works" but that's entirely due to hardware support.

Winmodems were a nightmare. But nobody uses modems anymore.

CD/RW were a nightmare and were the only rewriteable mass storage media. For over 10 years now everyone uses USB mass storage, all devices follow the same protocol, that problem simply went away.

Similarly with webcams, when they all started following a standard USB interface.

Similarly with pretty much all the hardware that one used to fight with. The progress is because of standardisation on hardware interfaces (across various versions of Window and Mac).

I am not at all discounting the efforts of the kernel developers (and, in some cases, hardware manufacturers) in making sure all these things work perfectly. It is a huge and impressive effort. If it weren't, the BSDs would be competitive with Linux on the desktop and laptop. That they aren't is due entirely to hardware support. (And, ironically, the BSDs supported USB before Linux did. Yet, from my experience, FreeBSD continued to crash reliably when using USB media, well into the 2000s.)

But: NONE OF THIS PROGRESS CAN BE CREDITED TO BREAKING LONGSTANDING UNIX CONVENTIONS.

Many of us were used to Ctrl-Alt-Bksp killing the X session. It was convenient, it is an unlikely key combination to hit accidentally. But some people decided it was dangerous. Well, ok.

Many of us were used to Ctrl-Alt-Function keys giving you consoles. That got disabled too for reasons I don't understand. Well, ok.

But the acquiescence is, I think, more from "this is not worth fighting over" or "too few people really care about this", not so much from "this is an intelligent decision and we should go with it".

When you start killing processes on logout, it's a whole different matter. It affects huge numbers of users who have been brought up on the "Unix way". Not just convenience, but actual work. You risk losing days or weeks of work because you forgot that the powers-that-be changed how basic practices work.

Demanding that commands like nohup, screen, tmux and numerous in-house applications adapt to the new reality, because this is what Gnome developers have decided that Gnome users want, is unbelievable arrogance.

Arguing that all these programs can be trivially fixed (how trivially?) misses the point.

The BSDs have a term, POLA ("principle of least astonishment"), that serves as a policy principle for this sort of thing. Linus has something similar for API breakage in the kernel. Even Microsoft works incredibly hard to ensure compatibility across Windows versions.

Destroying decades-old practice in this manner is complete disrespect for the longest-standing and most loyal users. There is no other way to put it.

Rant

Posted Jun 10, 2016 17:28 UTC (Fri) by pizza (subscriber, #46) [Link]

> Yes, the modern Linux desktop "just works" but that's entirely due to hardware support.

And a great deal of software written on top of that hardware support to automagically configure and utilize said hardware.

> CD/RW were a nightmare and were the only rewriteable mass storage media.

Nightmare how? At worst they were about the same as using them under Windows.

> Similarly with webcams, when they all started following a standard USB interface.

Webcams remain an ongoing source of joy, because even within that "standard" there's plenty of rope for manufacturers to hang themselves with, and the list of workarounds is a mile long at this point.

> But: NONE OF THIS PROGRESS CAN BE CREDITED TO BREAKING LONGSTANDING UNIX CONVENTIONS.

No, when it came to actual hardware, drivers, and low-level OS manipulation, there were never any UNIX conventions. Every UNIX had its own way of doing those things. And still does. (Even the "everything is a file" abstraction wasn't ever true)

Meanwhile, beyond POSIX, there wasn't any meaningful conventions for building higher-order systems. Sessions? IPC? Everyone had their own mechanisms, none compatible. Even from a GUI perspective, beyond raw xlib, you had nothing you could count on being universal.

Rant

Posted Jun 15, 2016 3:00 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (6 responses)

> Many of us were used to Ctrl-Alt-Function keys giving you consoles. That got disabled too for reasons I don't understand. Well, ok.

Huh? When was this disabled? Have a reference?

Rant

Posted Jun 15, 2016 6:35 UTC (Wed) by jrigg (guest, #30848) [Link] (5 responses)

It's switched off by default. There's a way to enable it in systemd config but I can't find a link to the info at the moment. I'm using sysvinit-core on my Debian systems which allows it to work.

Rant

Posted Jun 15, 2016 6:53 UTC (Wed) by jrigg (guest, #30848) [Link]

Found it: http://0pointer.de/blog/projects/serial-console.html

Looks like you can re-enable additional ttys by changing NAutoVTs= in logind.conf .

Rant

Posted Jun 15, 2016 8:16 UTC (Wed) by micka (subscriber, #38720) [Link] (3 responses)

Debian systems and one ubuntu system.
All have consoles on Ctrl+Alt+Fn. I don't remember having changed a config setting.

Rant

Posted Jun 15, 2016 8:57 UTC (Wed) by jrigg (guest, #30848) [Link] (2 responses)

When I tried systemd on Debian 8 Testing prior to upgrading my systems, Ctrl+Alt+Fn didn't work. It may have changed by the time Debian 8 was released (which wasn't long after I tested it) but I haven't tried it since.

Rant

Posted Jun 15, 2016 11:56 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

Huh. The default on Fedora is for 6 VTs.

Rant

Posted Jun 15, 2016 15:05 UTC (Wed) by anselm (subscriber, #2796) [Link]

On Debian 8 (Jessie), too. Everything is basically as it used to be.

Distributors ponder a systemd change

Posted Jun 16, 2016 7:30 UTC (Thu) by geek (guest, #45074) [Link]

hey, what's wrong with Lennart's hair style?

Distributors ponder a systemd change

Posted Jun 28, 2016 23:14 UTC (Tue) by mcortese (guest, #52099) [Link] (1 responses)

While I don't have strong feelings for or against this change in systemd, I have something to say about SIGHUP, nohup & disown that many keep promoting in the comments.

In the glory days of UNIX, I would log in to a shell. When starting a user process, the choices were either run until I shut down the shell (i.e. I log out) or keep running forever. The shell would send a SIGHUP on exit, so the choices were actually either obey or ignore such signal. Since 'obey' was the default, 'ignore' had to be specified: nohup was all I needed, back then.

Today I can have several shells open inside one graphical session, and several (graphical or textual) sessions open at once. The behavior I might require from a user process varies from 'run until I shut down the shell', to 'run until I close this session', to 'run until I close all sessions' to 'run forever'. I can't express this variety with nohup. What I need is a replacement for nohup where I can specify the exact 'scope' I want to give to a process. (Ideally, that would be paralleled by new signals besides SIGHUP that express the variety of possible events with the same granularity, but I'm pragmatic enough to understand this will never happen!)

Now, I don't know if systemd-run is the "nohup on steroids" I envision, nor if KillUserProcesses can make up for the missing signals, but pretending that nohup is still the best tool we can hope for is disingenuous!

Distributors ponder a systemd change

Posted Jun 29, 2016 13:57 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

That's my issue with the "but SIGHUP!" arguments that have been used. However, adding new signals is even more unportable than the status quo. Instead I'd rather see a nohup-like tool which stuffs a subprocess into a new PAM session which would basically allow it to persist the session and not get killed. I don't know whether such a thing even works with PAM or not though. Needs investigation.

Distributors ponder a systemd change

Killing processes on logout

Arguments pro and con

Distributor response

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change

Distributors ponder a systemd change