This week in "As the Technical Committee Turns" [LWN.net]

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 2:22 UTC (Thu) by JdGordy (subscriber, #70103) [Link]

Haha, Awesome write-up!

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 2:38 UTC (Thu) by josh (subscriber, #17465) [Link] (1 responses)

This article nicely summarizes the latest developments on the issue, with exactly the degree of seriousness they deserve.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 20:47 UTC (Thu) by marcH (subscriber, #57642) [Link]

Indeed the most important piece of news seems to be the journalist's desire to move from newspapers to TV (or even cinema?)

The Times They Are a-Changin'

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 2:42 UTC (Thu) by smoogen (subscriber, #97) [Link] (1 responses)

I laughed, I cried, I felt anguish and passion. Two thumbs up from this critic.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 11:02 UTC (Thu) by nix (subscriber, #2304) [Link]

My only objection is that the plot has become too convoluted to be plausible. Can't the writers try harder? This could never happen. :P

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 2:52 UTC (Thu) by vonbrand (subscriber, #4458) [Link]

I predict that the soap opera will go on for a year or so, while systemd silently becomes de facto standard on Debian/Linux. Much noise from Debian/kFreeBSD, with them finally settling on FreeBSD's native system. LWN should consider a franchise on popcorn...

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 4:31 UTC (Thu) by Thue (guest, #14277) [Link] (5 responses)

> So a ballot on the default init system still runs a high risk of coming down to a tie, which may or may not be resolved by the chair's casting vote.

Why the "may not"? Surely Bdale's will use his position as chair to break the tie? Is there any reason to think otherwise?

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 4:52 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (4 responses)

I read that as part of the wry humor prevalent through the article however on a more serious note, a chair may not chose to use his position if it is very likely that a GR is going to overrule the TC's decision or non decision. *sigh*

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 5:30 UTC (Thu) by Thue (guest, #14277) [Link] (3 responses)

> a chair may not chose to use his position if it is very likely that a GR is going to overrule the TC's decision or non decision. *sigh*

Reference? That sounds very unlikely to be a rule to me. How would people know if something was likely to be overruled by a GR? - Such fuzzy criteria don't make sense in a ruleset.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 5:33 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (2 responses)

There is of course no such rule. Was merely sharing my perspective.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 8:17 UTC (Thu) by Thue (guest, #14277) [Link] (1 responses)

But the whole point of the vote is to let the TC decide what to do, instead of doing a GR. Abstaining for that reason seems silly and unlikely to me.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 15:32 UTC (Thu) by geofft (subscriber, #59789) [Link]

Multiple TC members have expressed the opinion that they don't mind being overruled by a GR on a simple majority and they're not enthusiastic about making a decision that's at odds with the majority view of the project, if such a majority view genuinely exists.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 7:10 UTC (Thu) by bdale (subscriber, #6829) [Link] (2 responses)

Note that Debian's voting process captures more than just what a voter's top choice is. Thus, even if the first choice is split 4:4, it is entirely possible that the preferential ranking of other choices will determine a winner without requiring the use of a casting vote at all.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 8:32 UTC (Thu) by hirnbrot (guest, #89469) [Link] (1 responses)

However, doesn't it seem likely that the votes will either be "systemd, upstart, the rest" or "upstart, systemd, the rest", which will mean the ordering of the rest _won't_ matter and hence there will be a 4:4 split?

I think I haven't heard a ctte member favoring openrc or the status quo over either.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 10:07 UTC (Thu) by moltonel (guest, #45207) [Link]

> doesn't it seem likely that the votes will either be "systemd, upstart, the rest" or "upstart, systemd, the rest", which will mean the ordering of the rest _won't_ matter and hence there will be a 4:4 split?

Ian has already cast a "SysV, Upstart, OpenRC, Systemd" vote (or something similar, I'm only infering from the lwn article), so your premice is wrong.

Gaming voting systems is a very natural thing to do, people usually vote for an end result while taking the polls into account, instead of simply voting for their actual beliefs. So Ian may not be the only one who doesn't place systemd and upstart in the top two.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 8:32 UTC (Thu) by nzjrs (guest, #35911) [Link] (1 responses)

Its disappointing that this soap opera is not syndicated nation wide - for then millions more could enjoy Ian's monotonous subterfuge.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 12:30 UTC (Thu) by k3ninho (subscriber, #50375) [Link]

Think interplanetary, these volunteers are building The Universal O.S.

K3n.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 10:54 UTC (Thu) by kugel (subscriber, #70540) [Link] (126 responses)

Ian Jackson's vote (Upstart first, systemd last) is an evil attempt to push upstart as the winning second choice in a 4:4 split. He's poisoning the entire debate.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 11:04 UTC (Thu) by nix (subscriber, #2304) [Link] (125 responses)

Hang on, since when did strategic voting become 'evil'? It's routine in all democracies. I'd say it's probably *ineffective*: it only really works if lots of voters from a very large mass are doing it. In this case it serves only to send a message of annoyance rather than to affect the result.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 11:20 UTC (Thu) by kugel (subscriber, #70540) [Link] (123 responses)

It's evil because systemd is clearly technically superior to openrc and sysvinit. Even Ian has to admit that. He's obviously trying to downvote systemd despite technical arguments in order to push upstart.

Please enlighten me as to why this behavior will have no effect. What will happen if all systemd proponents vote systemd>upstart>rest (3 of them have done so) and all upstart proponents vote upstart>systemd>rest except Ian who votes upstart>rest>systemd? Wouldn't upstart win the vote in this system (not considering Bdale's char position)?

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 15:05 UTC (Thu) by cortana (subscriber, #24596) [Link] (1 responses)

I believe the result is still a tie, since the result of the individual systemd <-> upstart pairwise race is still 4:4.

Debian Soap Opera vote simulator

Posted Jan 30, 2014 16:59 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Hmmm. Somebody should make a "Debian Soap Opera vote simulator" to make it easier to work this stuff out. So, what we "ought" to have if people behaved sincerely is:

ABCDE x 4
BACDE x 4

which would clearly be a draw (pending charmain's casting vote) under any sensible voting system, there is no clear preference expressed.

With the hypothesised Ian "spoiler" vote, you get

ABCDE x 4
BACDE x 3
BCDEA x 1

So, matrix time:

x mA mB mC mD mE how many prefer nY to mX
nA . 4 7 7 7
nB 4 . 8 8 8
nC 1 0 . 8 8
nD 1 0 0 . 8
nE 1 0 0 0 .

As we expected there is no Condorcet winner.

Now we use the Schulze method, finding "strongest paths", which are paths following voter preferences ranked by their weakest link, and that should resolve things, but I struggle with the descriptions I've found and can't say for sure what they tell us, particularly about A and B which we care most about here. My best attempt is:

x mA mB mC mD mE strongest path from mX to nY
nA . 4 7 7 7
nB 4 . 8 8 8
nC 1 1 . 8 8
nD 1 1 1 . 8
nE 1 1 1 1 .

So I think what happens is that Ian's "burying" (the scheme of insincerely ranking a strong candidate as weak) just inflates the weakest possible path and achieves nothing. The overall outcome is still a tie. But I am not entirely sure. The original paper by Schulze talks about eliminating paths that are used symmetrically, but if I do that options A and B both lose, which seems unexpected. A simulation (preferably using the same software Debian runs for an actual vote) would be useful.

FWIW This problem (usually on a less acute scale) is why you can't/ shouldn't deploy something like this for a real political election. It is vital to the actual purpose of democracy that everybody understands how voting works and believes voting determines the outcome. If the system is too complicated very few voters will have confidence in it, without that confidence nothing of value has been achieved.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 15:50 UTC (Thu) by ebiederm (subscriber, #35028) [Link] (116 responses)

systemd is not clearly technically superior.

systemd deeply depends on cgroups the flakiest, most unreliable, and most volatile piece of generic kernel code that I am aware of. Even substantial portions of the cgroup ABI is considered broken by design, and there are active efforts to change it.

A piece of software that requires other software to be used in it's most unreliable mode (when that is avoidable) is not generally considered to be good engineering.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 16:01 UTC (Thu) by HelloWorld (guest, #56129) [Link]

> systemd deeply depends on cgroups the flakiest, most unreliable, and most volatile piece of generic kernel code that I am aware of.
Systemd doesn't require any resource controllers to be enabled, just the cgroups feature itself.
http://0pointer.de/blog/projects/cgroups-vs-cgroups.html
Are the bits that systemd requires actually *that* problematic? Also, what would be the alternative? Just letting double-forking processes escape?

From anti-systemd to pro-systemd in the shortest time

Posted Jan 30, 2014 17:38 UTC (Thu) by hummassa (subscriber, #307) [Link] (114 responses)

Read systemd's and upstart's source code.

That's the path I took.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 30, 2014 22:58 UTC (Thu) by cdmiller (guest, #2813) [Link] (111 responses)

Read systemd and upstart, then the following piece of a sysv init script:

#!/bin/sh
test -f /usr/sbin/sshd || exit 0
case "$1" in
start)
echo -n "Starting sshd: sshd"
/usr/sbin/sshd
echo "."
;;
stop)
echo -n "Stopping sshd: sshd"
kill `cat /var/run/sshd.pid`
echo "."
;;
etc...

And which wins the prize for maintainability and troubleshooting on a server?

Not to discount folks/distros have made significant abominations of init scripts over the years, often layering on more complexity than needed but still...

From anti-systemd to pro-systemd in the shortest time

Posted Jan 30, 2014 23:47 UTC (Thu) by anselm (subscriber, #2796) [Link] (21 responses)

How does that restart the SSH server if it crashes?

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:17 UTC (Fri) by cortana (subscriber, #24596) [Link]

Not to mention keep track of user sessions via service that allows resource limits to be applied to them, and/or that allows other code to query who is logged on, and in what state each logged on session is, etc...

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:28 UTC (Fri) by vonbrand (subscriber, #4458) [Link] (6 responses)

The real fun starts if the PIDs have rolled around and you end up signalling some random inocent process... happened to me, BTW. Mystifying, to say the least.

Besides, sshd is a very tame daemon: simple configuration, it doesn't start all sort of long-lived, independent children. But anyway, how do you disable ssh, which presumably means terminate all running ssh sessions?

And what if you want to start it either on demand (via inetd(8)) or on boot... to switch over you have to remember to edit two widely different configuration files, and restart a slew of things. Then you cross your fingers hoping it won't blow up on next boot. Lather, rinse, and repeat. Yes, these types of "won't ever happen" and "has never, ever happened to anybody I know" has happened to me repeatedly (while just in charge of a smallish computer lab). Yes, the sysvinit rc files (and general infrastructure) has become more robust and easier to manage over the years, but at the cost of horrible complexity. The rc script for ntpd (not much more work to do that what you list) and the shell libraries it uses on some Fedora way back was some 800 lines of not exactly straightforward, understandable code. And even so, if the machine's clock was out of whack you'd only find out by accident that it wasnt't being updated. Yes, because this acme of careful design and stellar engineering has no chance of finding out something is awry.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 15:18 UTC (Fri) by nix (subscriber, #2304) [Link] (5 responses)

sshd is actually a nice example of a daemon where I do *not* want systemd's otherwise nice feature of cgrouping things up. Its children should be independent, so that stopping sshd for an upgrade doesn't kill all my ssh sessions, even though half of each child is still root-owned and presumably not in the per-user cgroup. systemd clearly handles this, but I have no idea how, there's nothing obvious in the .service files... it's still rather opaque to me.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 15:57 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

I believe "systemctl restart sshd" would be what you want here. I don't think "stop"/"start" is semantically equivalent to "restart". Unfortunately, the docs don't go into the difference other than "stop" using "deactivate". I'd guess one of the "systemd for Administrators" articles explains the difference.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 16:00 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Actually, looking into it, it is KillMode= which is "process" for sshd.service here. It's explained in systemd.kill(5).

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 18:42 UTC (Fri) by cortana (subscriber, #24596) [Link]

AFAIK restart is the same as stop and then start.

KillMode=process will only kill the original sshd process, true, but if you look at 'systemctl status ssh.service' while you are logged in, you'll see that the child processes aren't in the /system/ssh.service cgroup; they are in their own /user/$uid.user/$session.session cgroup. So even if you set KillMode=control-group, stopping ssh.service won't kill off user sessions.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 18:39 UTC (Fri) by cortana (subscriber, #24596) [Link] (1 responses)

I believe the children are transitioned to per-user-session cgroups by pam_systemd.so, so you will see:

$ systemctl status ssh.service
ssh.service - LSB: OpenBSD Secure Shell server
   Loaded: loaded (/etc/init.d/ssh)
   Active: active (running) since Fri 2014-01-31 09:24:16 UTC; 9h ago
   CGroup: name=systemd:/system/ssh.service
           └─1010 /usr/sbin/sshd

$ pstree 1010
sshd,1010
  └─sshd,8120
      └─sshd,8122,vagrant
          └─bash,8123
              └─pstree,3123 -pua 1010

And yet:

$ loginctl session-status c28
          Since: Fri 2014-01-31 12:19:31 UTC; 6h ago
          Leader: 8120 (sshd)
          Remote: 10.0.2.2
         Service: sshd; type tty; class user
           State: active
          CGroup: systemd:/user/1000.user/c28.session
                  ├─4654 loginctl session-status c28
                  ├─4655 pager
                  ├─8120 sshd: vagrant [priv]
                  ├─8122 sshd: vagrant@pts/4
                  └─8123 -bash

So stopping ssh.service won't kill any processes in the cgroups under /user/.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 9:36 UTC (Mon) by nix (subscriber, #2304) [Link]

Ah, PAM of course. This combined with KillMode= makes me happy that there's a belt-and-braces solution for this. (KillMode= is probably more useful for things like PostgreSQL where the postmaster has to talk to its children to shut down gracefully and would not be happy if they were SIGTERMed first.)

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 23:11 UTC (Sat) by cdmiller (guest, #2813) [Link] (12 responses)

Didn't realize a sysv init script was supposed to restart daemons when they crash...

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 0:15 UTC (Sun) by anselm (subscriber, #2796) [Link] (11 responses)

This is the 21st century. The competition does it as a matter of course.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 22:33 UTC (Sun) by dlang (guest, #313) [Link] (10 responses)

actually, this is the 21st century, most places restart the system (i.e. VM), not the process if something crashes.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 9:45 UTC (Mon) by HelloWorld (guest, #56129) [Link]

Yes, and they do that not because there's any sane reason to but because it didn't really work before systemd.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 10:22 UTC (Mon) by anselm (subscriber, #2796) [Link] (8 responses)

And why would that be? Not conceivably because the way server processes are traditionally started offers us no reasonable way to restart just the service in question? Not without lots of additional band aid and baling wire, anyway? (Which suddenly makes just restarting the machine look reasonable by comparison.)

We're now in the fortunate position of being able to solve this problem (among others) without resorting to silly and desperate solutions. It is funny how people are clutching at straws to defend software that was pretty useful when it was new in the 1980s but no longer represents the state of the art.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 11:08 UTC (Mon) by dlang (guest, #313) [Link] (7 responses)

if you really believe that systemd shutting down a service can resolve all possible issues and that the system is guaranteed to be in the same clean state that a rebooted (or ideally, re-created) system would be in, then you are dreaming.

yes, it may be better than without systemd, but that's a long way from being something that's a known state.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 11:44 UTC (Mon) by anselm (subscriber, #2796) [Link] (6 responses)

If you really believe that rebooting a VM magically resolves all possible issues and that the system is guaranteed to be up and running afterwards, then you are dreaming.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 14:35 UTC (Mon) by raven667 (subscriber, #5198) [Link] (5 responses)

Oh come on now, what he said is true, big "cloud service" shops don't do troubleshooting by hand when there is a failure, they just kill the VM and rebuild it from scratch through their configuration management system. Netflix has written papers about this. What might be worth arguing about is whether systemd can replace some home-grown utilities keeping services alive and whether the overall reliability is improved so that you are rebuilding fewer VMs.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 15:25 UTC (Mon) by anselm (subscriber, #2796) [Link] (1 responses)

Yes, and because Netflix doesn't do it it must be a stupid idea in general.

For one, even the big cloud services will prefer not having to rebuild a VM if restarting a process will also work (better than it does based on SysV init). For example, even a big cloud service will eventually want to figure out why a service keeps crashing, and obliterating the evidence by rebuilding the VM in question is not exactly helpful in that case.

For another, not every outfit in the world is a big cloud service and has the sort of tooling that lets them rebuild a VM automatically at the snip of a finger. These others often don't have the sort of tooling that will reliably track a service started by SysV init for them, either, and therefore stand to gain from a system that will do it for them out of the box. If anything, the big cloud services tend to have people on staff to sort out things like transparent service restart for them even if that means coming up with a way of rebuilding VMs at short notice, because this is basically how big cloud services make their money, while the IT people at other outfits are often forced to spend their time on other more immediately important issues. We can argue that everybody should think and work like a big cloud service (most of the required pieces are conveniently available, after all) but this is not usually how things pan out in actual practice.

Finally, service restart vs. VM rebuilding notwithstanding, even big cloud-based outfits seem to like systemd. Spotify, for example, has officially come out in support of making systemd the default init system on Debian.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 15:41 UTC (Mon) by raven667 (subscriber, #5198) [Link]

> Yes, and because Netflix doesn't do it it must be a stupid idea in general.

Dude, you are barking up the wrong tree, relax.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 21:35 UTC (Mon) by khim (subscriber, #9252) [Link] (2 responses)

Oh come on now, what he said is true, big "cloud service" shops don't do troubleshooting by hand when there is a failure, they just kill the VM and rebuild it from scratch through their configuration management system.

This is chicken and egg problem: since it was impossible for a long time to reliably separate services without VM these “big "cloud service" shops” went with VM. But guys who though that this approach is just stupid developed another solution. This solution (today we know it under name cgroups) is used today by systemd to manage daemons and services.

Let me repeat once more: underlaying functionality which systemd uses today to manage processes was created specifically for the needs of “large cloud services”. Now, it's entirely possible that some use-cases are covered poorly by this approach (hey guys from Google developed it for Google and different coulds are… well, different) but then it just means that it must be fixed.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 22:00 UTC (Mon) by raven667 (subscriber, #5198) [Link] (1 responses)

Yes and my very next sentence was agreeing with you on this point, I was just pointing out that currently what is done for automatic recovery is re-provisioning, as the earlier poster seemed to be arguing that point.

> What might be worth arguing about is whether systemd can replace some home-grown utilities keeping services alive and whether the overall reliability is improved so that you are rebuilding fewer VMs.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 23:04 UTC (Mon) by dlang (guest, #313) [Link]

and I'll say that reprovisioning from scratch is going to be more reliable than containers or cgroups. you don't know what files were corrupted by the misbehaving software.

It's also possible to leave other problems around (shared memory is one thing that I can think of) or have a process issue commands that put the kernel into a different state than what you want.

If you want to be sure, you reprovision.

Since you need to be able to provision from scratch rapidly anyway to deal with bursts of load (in the cloud environment everyone is pushing for), using that mechanism to deal with crashes increases your reliability.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 30, 2014 23:49 UTC (Thu) by zuki (subscriber, #41808) [Link] (81 responses)

1. So you run 'service sshd start' somewhere. No error. Only then you realize that your coworker uninstalled sshd package. Your script just returned 0==success after failing to do the requested job.

2. You run your script. ssh config file has a fatal error. Result the same as above.

3. Let's say that you want to connect to the ssh daemon from a second script. You start the script, only to learn that sshd hasn't actually started yet. You stick a 'sleep 1' in the other script. Everything works, on the faster machines, fails on the slower ones.

4. sshd is unexpectedly killed by oom killer. You run 'service sshd start; service sshd stop' in quick succesion. Ooops, the start script returned before sshd manager to write its pidfile. The stop script kills a random program which got the same pid meanwhile.

5. sshd has hung. You run the stop script. The script send TERM, but the program is seriously stuck and will only react to KILL. Nevertheless, you are notified about success.

Yay for reliability!

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 0:56 UTC (Fri) by hummassa (subscriber, #307) [Link] (72 responses)

Simple things made reliable; complicated things made possible. *That* is what I think these days systemd has to offer over the alternatives.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 1:16 UTC (Fri) by dlang (guest, #313) [Link] (71 responses)

as long as you are doing things the way the systemd developers think you should

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 1:23 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (70 responses)

Is there a reason you shouldn't do things their way?

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 1:36 UTC (Fri) by dlang (guest, #313) [Link] (69 responses)

well, sometimes their way just doesn't get the job done and you need to work around it.

systemd is not the be-all end-all of software, it does have limits. When you hit those limits, or you have a different model of how you want your system to work than the systemd developers imagined, you now have to fight systemd

I know of one case (that I won't go into here) where people are looking at having to do LD_PRELOAD's on everything that's run to work around systemd

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 1:38 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (67 responses)

???

You can always fall back to SysV-init scripts for something too perverted to be started by systemd.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 1:48 UTC (Fri) by dlang (guest, #313) [Link] (66 responses)

but if what you are doing doesn't play nice with cgroups or some other component that they have deemed 'essential', that won't help you.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:05 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (65 responses)

Which other component, exactly?

The only unique feature that is non-optional on systemd are cgroups. And right now you can construct a parallel cgroups hierarchy just fine. The future plans for The One World Order^H Cgroups tree would stop this, but it's not a direct fault of systemd.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:12 UTC (Fri) by dlang (guest, #313) [Link] (64 responses)

really?? do you hear what you're saying??

you are in effect saying that if you want to do something that systemd doesn't want you to do, you don't really want to do it.

we should limit our future activities to only what the systemd developers envision is the one true way to do things.

the issue I know about is different, but it is performance related similar to the cgroups example I gave above. I've been asked to keep quiet about it for a bit while they try to work out options with redhat, but from what I'm hearing, the workarounds are not pretty (the LD_PRELOAD seems like the 'clean' way to do things)

But even if this gets worked out and systemd gets changed to make this particular thing easy to do, the attitude that if systemd doesn't support it you should just not do it is one of the big reasons for people to be opposed to systemd.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:16 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (18 responses)

>really?? do you hear what you're saying??
Sure.

>you are in effect saying that if you want to do something that systemd doesn't want you to do, you don't really want to do it.
And why not? At least until they are shown to be wrong.

> the issue I know about is different, but it is performance related similar to the cgroups example I gave above.
Which example? What performance issues?

> I've been asked to keep quiet about it for a bit while they try to work out options with redhat, but from what I'm hearing, the workarounds are not pretty (the LD_PRELOAD seems like the 'clean' way to do things)
I don't understand the problem - can it be worked around by disabling extra cgroups controllers (perhaps completely?) and/or by using plain old SysV init?

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:32 UTC (Fri) by dlang (guest, #313) [Link] (17 responses)

as I said, I've been asked to keep quiet on these for a little bit, so I won't go into details, but I will say it's not something that can be solved by just using init scripts or disabling cgroup controllers. It requires either not using components that systemd has declared 'essential' not optional, or doing LD_PRELOAD to change the behaviour of the binaries to have them avoid systemd, or ??? which they are looking for.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:36 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Since you can't provide an example, I can just assume that it's a kernel issue.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:37 UTC (Fri) by zuki (subscriber, #41808) [Link] (6 responses)

So basically some other program has behaviour which is not wanted, but it's systemd developers' fault.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:46 UTC (Fri) by dlang (guest, #313) [Link] (5 responses)

well, it works without a problem before systemd inserts itself into the process, replacing the stuff that was there before.

so yes, it is a systemd related problem

it's the systemd developer's fault because they mandate that if you use systemd, systemd will do this.

This isn't a kernel issue, this is sticking strictly with userspace components.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 3:18 UTC (Fri) by dlang (guest, #313) [Link] (4 responses)

for what it's worth, we do know the exact cause of the problem, it's not a mystery and it makes perfect sense.

It's also something that a desktop user is almost never going to run into, and it is a result of physics, not sloppy programming or something like that. The only way to avoid the problem is to sidestep the mandatory systemd component.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 3:24 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

>the mandatory systemd component.
Which one? Automount? Does it have something to do with network latency?

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 5:37 UTC (Fri) by raven667 (subscriber, #5198) [Link]

I'm a little bit annoyed by your lack of detail but if there is a bug report here then I'd love to see it get fixed, which is going to require some technical detail, hopefully not giving away any proprietary secrets.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 19:07 UTC (Fri) by jspaleta (subscriber, #50639) [Link] (1 responses)

I take it this is a bit of proprietary code that none of the people discussing this of here will ever get to examine directly?

-jef

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 23:41 UTC (Fri) by dlang (guest, #313) [Link]

as you can see from the final details, no it's not proprietary code, it's a problem that affects anything that generates a large volume of logs.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 9:18 UTC (Fri) by nzjrs (guest, #35911) [Link] (8 responses)

do you think it more likely that your problem will be solved by the plethora of systemd developers or that one upstart dude at canonical?

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 21:42 UTC (Fri) by dlang (guest, #313) [Link] (7 responses)

In this case, the one upstart dude at Cononical

because he's not arrogant enough to believe that he has the answer to all the worlds problems and so he wouldn't say that he needs to take over all logging to begin with, but if he did create a way to intercept the logs and offer the capabilities of the journal, he wouldn't declare it mandatory and make it so hard for long-term successful usage patterns to continue.

The initial journald manifesto showed a drastic lack of knowledge about the current state-of-the-art and best practices in logging and the capabilities of syslog tools. Just about everything that was stated was true for the old sysklogd that used to be the default, but syslog-ng has been around a long time, and around 2006/2007 Rsyslog became the default for just about all distros, and the capabilities of Rsyslog and syslog-ng are far greater than the systemd developers imagined.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 22:29 UTC (Fri) by raven667 (subscriber, #5198) [Link] (1 responses)

> The initial journald manifesto showed a drastic lack of knowledge about the current state-of-the-art and best practices in logging and the capabilities of syslog tools. Just about everything that was stated was true for the old sysklogd that used to be the default, but syslog-ng has been around a long time, and around 2006/2007 Rsyslog became the default for just about all distros, and the capabilities of Rsyslog and syslog-ng are far greater than the systemd developers imagined.

Yeah, I kind of thought that too, the journal is the weakest part of the systemd tools, much of its functionality can and was added to rsyslogd. I'm not sure that trying to add all the features of rsyslogd or syslog-ng into journald makes sense, although keeping journald as a small systems logger that can run in early boot (PID 2) and make sure that nothing is lost is a good thing I think, it should just be able to move out of the way if you need high performance. Or maybe a new high-throughput protocol between journald and your syslog daemon using shared memory or kdbus or something, something to reduce context switch overhead for those highest of high throughput systems.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 23:21 UTC (Fri) by dlang (guest, #313) [Link]

journald makes a lot of sense for single-user machines (desktops/laptops) where Rsyslog (and similar) are overkill and the configuration complexity they introduce to do the simple things is unwanted.

But once you move to a system that's providing services to others, you _very_ quickly get to where the advantages over journald dominate.

As for a high speed interface directly to rsyslog, this will help, avoiding the trip to disk and the need to read and parse the binary format would be a win (there's already been a 'cause rsyslog to fill your disk' bug as the result of journald errors where corruption of the binary format causes the 'next message' pointer to point backwards, and the hack of ignoring it if it points backwards is just that, a hack)

But you still have the situation where the log is written to the socket, then journald reads from the socket into memory, does lookups for the trusted properties, and then will need to format these two sets of data to be sent to rsyslog, send them to rsyslog, and then rsyslog parse the trusted properties out.

that's a lot more steps and copies of data than application writing to a socket, rsyslog reading from the socket and looking up the trusted properties.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 22:42 UTC (Fri) by cortana (subscriber, #24596) [Link] (3 responses)

One of the reasons I like journald is because it captures a lot of useful metadata along with each message. Does rsyslog do something similar?

$ journalctl -o verbose | grep $'^\t' | cut -d = -f 1 | sort | uniq -c | sort -n
      1 	EXIT_CODE
      1 	EXIT_STATUS
      4 	SEAT_ID
      5 	KERNEL_USEC
      5 	USERSPACE_USEC
     13 	ERRNO
     13 	EXECUTABLE
     61 	_SYSTEMD_USER_UNIT
    187 	_KERNEL_DEVICE
    187 	_KERNEL_SUBSYSTEM
    187 	_UDEV_SYSNAME
    273 	LEADER
    273 	SESSION_ID
    273 	USER_ID
    451 	RESULT
    795 	UNIT
   1137 	CODE_FILE
   1137 	CODE_FUNCTION
   1137 	CODE_LINE
   1138 	MESSAGE_ID
   1409 	_SOURCE_MONOTONIC_TIMESTAMP
   1824 	_SYSTEMD_OWNER_UID
   1824 	_SYSTEMD_SESSION
  52403 	SYSLOG_PID
  65664 	_SYSTEMD_UNIT
  67418 	_SOURCE_REALTIME_TIMESTAMP
  67928 	_CMDLINE
  67930 	_EXE
  68473 	_SYSTEMD_CGROUP
  68474 	_COMM
  68986 	_GID
  68986 	_PID
  68986 	_UID
  69095 	SYSLOG_FACILITY
  70330 	SYSLOG_IDENTIFIER
  70395 	_BOOT_ID
  70395 	_HOSTNAME
  70395 	_MACHINE_ID
  70395 	MESSAGE
  70395 	PRIORITY
  70395 	_TRANSPORT

FYI, the fields starting with an underscore are obtained by journald from the kernel, so can be considered unforgeable. The other fields are supplied by the process logging the message, so can't be trusted.

I certainly think that journald has room to improve, but I don't understand the attitude that it's some pointless NIH project that gets in the way of real logging. At the end of the day you can tell journald to set Storage=none and it mostly gets out of the way.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 23:29 UTC (Fri) by dlang (guest, #313) [Link] (2 responses)

short version, rsyslog can grab metadata, the set of data gathered is not identical.

Rsyslog will get the UID, GID, PID, and path to binary, and name as visible to top (which I think is what you list as command line)

http://www.rsyslog.com/what-are-trusted-properties/

If there are other things that need to be pulled, they can be added (as it notes in that link), just identify what you are interested in and give some sort of explanation of why. Pulling every possible hunk of data takes time and space, so if there's a desire for a lot, it probably will need to be configurable.

mot of the other things that you list are things available to rsyslog already.

This had been discussed between Rainer (lead rsyslog developer) and Lennart prior to the announcement of journald, but Rainer had not added it to Rsyslog yet as of the time of the journald announcement.

> I don't understand the attitude that it's some pointless NIH project that gets in the way of real logging

the fact that there was so much FUD and misinformation in the initial announcement and follow-ups is the cause of a lot of this.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 1:00 UTC (Sat) by cortana (subscriber, #24596) [Link] (1 responses)

Ah, that's very interesting, thanks. There's certainly more overlap between what rsyslog and journald can capture than I thought. Sadly I think that journald being interposed between a process transmitting a log message and rsyslog would prevent rsyslog from being able to get the original value of these fields. :(

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 2:38 UTC (Sat) by dlang (guest, #313) [Link]

actually, processes running as root can forge all that data, so journald can forge that data so that Rsyslog gets the correct data. However, as I understand it, not everything gets passed through the log socket, so Rsyslog doesn't get setup that way, instead there is an imjournal module that interfaces with the journal the same way that journalctl does, reading and parsing the files that it creates.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 23:53 UTC (Fri) by anselm (subscriber, #2796) [Link]

he wouldn't declare it mandatory and make it so hard for long-term successful usage patterns to continue.

The systemd/journald combination does have its good sides, including things that the traditional logging infrastructure simply does not do (like forwarding stderr output from services to the journal, or making recent log entries available with service status output). It is difficult to foresee every single use case for a piece of software like systemd that has so many. We know that the systemd people aren't interested in portability patches, but that doesn't imply that they generally aren't interested in suggestions for features they didn't happen to think of by themselves.

Also, possibly there is a solution for the problem in question that doesn't involve LD_PRELOAD. The filesystem-namespace idea, for example, sounds reasonably like something systemd could support if it doesn't do so already. I'm pretty sure that, given a little time, this will be figured out.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 2:23 UTC (Fri) by zuki (subscriber, #41808) [Link]

Systemd supports standard daemonization through double-forking just fine. It also supports both styles of inetd initialization just fine. So that already pretty much covers the way majority of daemons is started.

Native systemd initialization protocol is pretty flexible too. For the more complicated case, you can have a #!/bin/bash script somewhere that does whatever magic you want, taking as much time as necessary, at some point notifies systemd about the master PID and exits.

But you seem to need even more than that... I guess my imagination is lacking.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 11:21 UTC (Fri) by nelljerram (subscriber, #12005) [Link] (43 responses)

> I've been asked to keep quiet about it for a bit while they try to work out options with redhat, [...]

So, if I've understood correctly,

- you're aware of a program whose needs aren't currently met by systemd

- discussions are underway with the systemd folk on how to enhance systemd to meet those needs.

Where's the problem, then?

> But even if this gets worked out and systemd gets changed to make this particular thing easy to do, the attitude that if systemd doesn't support it you should just not do it is one of the big reasons for people to be opposed to systemd.

But the actual attitude appears to be different from your last sentence: they've recognized a new need and are working on meeting it.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 21:31 UTC (Fri) by dlang (guest, #313) [Link] (42 responses)

Ok, overnight I was able to touch base with the people who requested that I keep quiet.

journald is considered a mandatory component, you are not allowed to eliminate and bypass it.

But if you are actually going to use your logs, especially in an enterprise environment, you need to get your logs to a central server (or at least to other software)

but routing the logs through journald involves a lot of overhead and for a high volume service writing logs (i.e. a production service), this can be crippling.

without journald, the process of getting your log to syslog is:

syslog() write socket, context switch , syslogd read socket

with systemd the minimum process conceptually is something like:

syslog write socket, context switch, journald read socket, journald persists it's own files (at least one switch to kernel land), syslogd write mimiced socket, context switch, syslogd read socket

in practice, I've heard reports that it can take as many as 11 writes from journald to persist the data to the files.

the fundamental problem is that routing through journald will involve a lot of copies of the data and context switches as the data moves around. no amount of optimization can eliminate this. The current real-life is even worse.

The fix that appears to be what is being settled on is to create a library to LD_PRELOAD that redirects the syslog call from /dev/log that journald takes over to some other socket so that the logs can bypass journald.

This is a concrete example, and the example of ssh/cgroups is another good example, but the point I was trying to make when I made the initial comment was that systemd is making statements about what the right way to do this is, and I don't care who makes such statements, there will always be a time and place where the best thing to do is to not do things that way. This is almost always under conditions that the person making the statement never imagined.

The best systems not only acknowledge this, but they make it easy to escape their limitations (ideally without having to throw out your entire system in the process)

the outraged response that I got on this thread that there may possibly be some reason to do things in a way other than the systemd way is a very bad sign as far as I'm concerned

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 21:54 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (18 responses)

> journald is considered a mandatory component, you are not allowed to eliminate and bypass it.
You can disable it, if you wish. It's not straightforward, but possible.

Additionally, anything that spams journald by the sheer amount of context switches is waaaay too broken.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 22:21 UTC (Fri) by dlang (guest, #313) [Link] (17 responses)

you are assuming that all logs represent errors.

For many production systems, logs represent activity that needs to be archived, correlated, etc.

It's not just the context switches, it's the need to go through disk I/O

and while you say that journald can be disabled, other folks (ones in direct content with RedHat on this issue) are saying differently, that disabling journald is not a supported configuration to run systemd in.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 1:23 UTC (Sat) by lsl (subscriber, #86508) [Link] (1 responses)

If you use rsyslog anyway disable journald's log persistence. It will then use tmpfs storage. This was the standard configuration in Fedora until recently.

Still, I really wonder how much log activity are we talking about here if that leads to performance issues.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 2:36 UTC (Sat) by dlang (guest, #313) [Link]

when I started working with Rsyslog, one of the first things that I identified was that it was doing 4 gettimeofday() lookups per log message (tracking the time it hit each state inside the application). Just removing those lookups resulted in a very significant performance improvement.

Rsyslog sometimes handles very high log volumes, hundreds of thousands of logs/sec (properly configured you will saturate gig-E before Rsyslog runs out of capacity)

the added overhead here is rather substantial, and on a active server it can be a problem. The problem would be that the system just wouldn't be able to handle as much load, but I don't know at what point the journald processes would bog down. Is journald multithreaded? if not, I can easily see it maxing out a single core long before the rest of the server is saturated.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 12:04 UTC (Sat) by khim (subscriber, #9252) [Link] (14 responses)

you are assuming that all logs represent errors.

No. I'm assuming messages sent to syslog represent events which affect the whole system: some process have started or dies, someone did dangerous operation (started suid binary, e.g.), etc. They are meant to be consumed by human and thus should not overwhelm human. If they can owerwhelm and journald then human has no chance. The rule of thumb is simple: if you fear that your message may disappear (because someone will crash your process, or you are in inconsistent state, etc) then you send it via syslog(3), if it's informational message then syslog(3) is just wrong interface.

We, too, are generating many megabytes of logs. Of course they are not sent via syslog(3)! They are first collected in-process (similar messages are coalesced—this is feasible because they are not just text but have some internal structure), then they are collected by specialized process and sent away in batches via network. This is sane approach for extremely high amount of logs to be processed by some automated scripts. To push all that via syslog will be just crazy.

XKCD comes to mind, really: when you've noticed that rsyslog calls gettimeofday() you first though should have been not “why is it doing that?” but “why noone else cares about that?”.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 12:17 UTC (Sat) by zdzichu (subscriber, #17118) [Link] (13 responses)

> No. I'm assuming messages sent to syslog represent events which affect the whole system: some process have started or dies, someone did dangerous operation (started suid binary, e.g.),

Sounds like you are describing job for audit subsystem, not syslog.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 12:43 UTC (Sat) by khim (subscriber, #9252) [Link] (12 responses)

Well, kinda. Syslog is just simplest and weakest audit subsystem in existence, but yes, it's an audit subsystem. If you'll try to store information about every RPC processed by all your daemons it'll choke, journald or rsyslogd.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 17:13 UTC (Sat) by dlang (guest, #313) [Link] (11 responses)

This shows that you do not understand logging at the enterprise level.

syslog can be an extremely capable logging system, and log sent to syslog are not the limited, intended only for humans, thing that you think they are.

you are thinking of syslog as it existed 15-20 years ago, not what Rsyslog, syslog-ng, nxlog, and logstash have all been doing for years.

I've run Rsyslog at gig-E wire speeds (~400,000 logs per second) and other people with faster networks have run it at over 1,000,000 logs per second. This will keep up with logging information about every RPC processed by your daemons.

And people do just that on a routine basis.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 18:42 UTC (Sat) by raven667 (subscriber, #5198) [Link] (2 responses)

I think this shows the difference between the journal and the rest of systemd. The init system has been largely stagnant for 20y and was in serious need of an overhaul, the log system was stagnant for a while years ago but there is now heavy competition and development, so much of this re-invention to bring logging into the modern era was already done by syslog-ng and rsyslog.

There is definitely a niche where new development was needed, to reliably capture data from the earliest part of boot, to record _all_ of the available meta-data the kernel can provide and to make the data at rest more easily searchable and tamper resistant. How much of this could work within the framework of the existing competitive logging utilities or needed new development is a value judgement and matter of opinion but while I think that the new journal implementation is fine for small systems, for large systems I'd rather extend the existing dominant log daemons to be able to handle the cases that journald picks up or have an early hand off.

Syslog has clearly defined inputs and outputs and is more amenable to multiple implementations being drop in replacements for one another than some of the other parts of init and systemd.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 19:49 UTC (Sat) by khim (subscriber, #9252) [Link] (1 responses)

Syslog has clearly defined inputs and outputs and is more amenable to multiple implementations being drop in replacements for one another than some of the other parts of init and systemd.

Syslog's “clearly defined inputs and outputs” include specialized marks for “ftp daemon” and “USENET news subsystem” but have no marks for HTTP server. Nuff said.

I can understand why one will want to plug syslogd interface into these modern solutions (these legacy systems needs to be supported, too), but to actually keep that joke of the interface around and build everything around it using larger and larger pile of hacks… that's just sad.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 22:31 UTC (Sun) by dlang (guest, #313) [Link]

the facility tag is 6 bits of data in the message, if you are so put off by the contents of those 6 bits (which mostly get ignored on large scale systems anyway), that you will dismiss anything that supports them, you are throwing the baby out with the bathwater.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 19:41 UTC (Sat) by khim (subscriber, #9252) [Link] (7 responses)

This shows that you do not understand logging at the enterprise level.

True. I do understand logging at the Amazon/Bing/Google/Yandex level, though.

syslog can be an extremely capable logging system, and log sent to syslog are not the limited, intended only for humans, thing that you think they are.

I don't doubt that for a second. This is just an application of first part of the third fundamental networking truth, though (with sufficient thrust, pigs fly just fine). From my experience “enterprise” guys tend to forget about second part (However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead).

When scalability is concened the best approach is Jeff Dean's rule: design for ~10X growth, but plan to rewrite before ~100X. Syslog design was not intended to handle such use-cases and while it it's possible to stretch it to cover that use case it's really, really, REALLY bad idea.

you are thinking of syslog as it existed 15-20 years ago, not what Rsyslog, syslog-ng, nxlog, and logstash have all been doing for years.

Nope. I'm looking on all that activity in wonder and think: are they just stupid or terminally insane? Syslog was not designed for such usecases. It passes data around via sockets, and sockets are just not designed to handle such usecase. Also if you are generating so much similar data (don't tell me all these 400'000 records are totally dissimilar, I'll not believe it for a second…) there are wast coalescing opportunities. And it's important to use them because when you need to send such massive amount of data around you want to send it in batches. Not just context switches are constly at these rates, but transfers between different CPUs in today's SMP systems are constly, too!

I've run Rsyslog at gig-E wire speeds (~400,000 logs per second) and other people with faster networks have run it at over 1,000,000 logs per second. This will keep up with logging information about every RPC processed by your daemons.

May be. But how much overhead will it incur on typical Google server with 4x10GbE (perhaps 4x100GbE by now, I'm not sure)? Is it wise to spend this much computing power on logging?

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 20:30 UTC (Sat) by raven667 (subscriber, #5198) [Link] (6 responses)

It seems you two are solving different problems. In the business computing space you are running software from multiple vendors as well as internally developed stuff and so a simple standard is what is used for these systems to inter-operate. In the big service provider computing space you can apply computer-science talent to build custom systems because you control your environment from the hardware on up. Those are substantial differences and I can see how you come to different conclusions as to what the "right" answer is on how to log.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 21:26 UTC (Sat) by khim (subscriber, #9252) [Link] (5 responses)

This is part of the answer but it's not the whole answer. “Big data” firms quite often are using “software from multiple vendors” too. And quite often it needs complex massaging to make sure it behaves as it should in the appropriate context.

I never worked in the “enterprise” but I have quite a few friends and collegues who did and from their explanation the real problem is related to in-the-enterprise-politics: you can not just go and say “our use of X is wrong, we need to replace it with Y or Z”. Because this will make the one who mandated X angry and may undermine it's authority. If there are a way to keep X and still achieve objectives this is the way that'll be chosen. Quite often this means that more resources and more money will be spent in the end but nobody will be forced to admit that s/he's wrong.

That's fine, they could do whatever they want but it also means that it's their responsibility to unclog their mess: if they are using screwdrivers to hammer in nails and new screwdriver is not usable in that role then it's their resposility to invent something to live with it.

In this thread there were quite a few explanations for how they could continue to use rsyslogd with systemd (LD_PRELOAD, namespaces, etc). It's doable. Yes, it'll be ugly and perhaps somewhat fragile but so what? The whole sheme was fragile to begin with.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 22:21 UTC (Sat) by raven667 (subscriber, #5198) [Link] (4 responses)

Most businesses just aren't going to fund the development of an in-house Linux distro that disallows /dev/log and syslog in favor of a home grown, non-standard system or drop any vendor or in house software that doesn't play nice with their custom logging mechanism, let alone give up on the whole ecosystem of log monitoring systems and SIEMs that consume syslog data. That's not just politics, not every business can have the technical staff to do things the way a technology services company does things.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 10:48 UTC (Sun) by khim (subscriber, #9252) [Link] (3 responses)

There are no need to use one solution for everything. /dev/log and syslog(3) work fine for low-volume logging. They will work even if your program is almost completely broken and this is valuable property to have.

As for “we can not drop or change X because it's too expensive”—I've seen that, too and more often then not it ends in a disaster: if it's “it's too expensive” to change X now then what gives you confidence that it'll be cheaper later? This typical enterprise myopia which puts this quarter results above long-term survival. If you don't control something in your company then costs spent on this piece tend to grow till the whole house of card collapses.

It's true that sometimes it's cheaper to use off-the-shell solutions because the thing you are doing are not that important, but if you are starting to care about one vs four context switches then either this part is in your core competence (and thus must be under your controls) or you are large enough to dedicate resources for the scaleable solutions even if it's not your core competence.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 16:47 UTC (Sun) by raven667 (subscriber, #5198) [Link]

That's much more reasonable and less strident than what I thought you were proposing.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 22:30 UTC (Sun) by dlang (guest, #313) [Link] (1 responses)

it's more than one context switch vs four, adn yes, businesses should not have to worry about these details. They should just be able to select between available tools that interoperate nicely with each other.

This includes being able to select between the different syslog implementations, and it also means being able to select between the huge number of tools that exist that deal with syslog messages today (including many that support very large volumes of logs)

No, this is not suitable for Google or Amazon levels of messages without consolidation, but it is suitable for just about any company below those levels.

Companies may choose to write customized logging mechanisms for their custom software, but every company (including google and amazon) runs a lot of software that they did not develop from scratch (think routers and switches for example), and so whatever system they use, it's going to have to support syslog anyway.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 0:18 UTC (Mon) by khim (subscriber, #9252) [Link]

think routers and switches for example

Well, yeah, this is good example, LOL. But the fact that Google actually has it's own software on routers and switches is just a funny coincidence. No, that fact is not the main difference. Difference between Amazon/Bing/Google/Yandex and “enterprise” lies not with the fact that “big data companies” deal with larger amount of traffic but with priorities: Amazon/Bing/Google/Yandex know that all their solutions may become deficient in the future and thus have contingency plan which are enrolled long before scalability limits of the existing architecture are reached. “Enterprises” tend to exploit whatever they have till they reach 1000% of the intended scalability level where any minor change can collapse the whole house of card then start running around like headless chickens when, inevitably, such change is actually introduced. Think Danger. I'm not sure why: certainly outages which may ruin the whole company are as important for Blackberry or Verizon as they are important for Google so why such a big difference in attitude?

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 21:59 UTC (Fri) by foom (subscriber, #14868) [Link]

The "outraged" response you get is because you just said vaguely "systemd is totally broken it sucks I hate it!". What did you expect, technical answers?

You seem to believe that journald fundamentally cannot disable writing to disk, which doesn't appear to be true.

http://www.freedesktop.org/software/systemd/man/journald....
documents "Storage=none" in journald.conf to disable all writing of log data.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 22:18 UTC (Fri) by raven667 (subscriber, #5198) [Link] (1 responses)

> the fundamental problem is that routing through journald will involve a lot of copies of the data and context switches as the data moves around. no amount of optimization can eliminate this. The current real-life is even worse.

That's an interesting problem for a high-volume logging. I'm going to be very interested to see how this is solved by the systemd team. It'd be nice if you could replace journald with rsyslogd for high-volume local logging, rsyslogd 7.4 even has journal input and output support, maybe building a way to pass the /dev/log open file descriptor back to your preferred syslog daemon when it starts up later in the boot process, as journald starts very very early to capture everything and I don't think we want to lose that functionality.

systemd is going to have to find a way to handle this workload efficiently

> the outraged response that I got on this thread that there may possibly be some reason to do things in a way other than the systemd way is a very bad sign as far as I'm concerned

Well this whole thing has become so politicized and full of misinformation that it becomes easy to presume malicious intent when there is a lack of information. You've provided one of the few real concrete examples where the systemd tools fall down, and your technical criticism has merit.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 23:14 UTC (Fri) by dlang (guest, #313) [Link]

> I'm going to be very interested to see how this is solved by the systemd team.

My understanding (as part of rsyslog, but not in direct contact with redhat) is the introduction of liblogging

Redhat was already talking about a liblogging to override the glib syslog writes so that they could add metadata and structure to the logging (see project lumberjack), and they are adding the ability to redirect the logs somewhere other than /dev/log, overriding what the program does.

By doing this and linking it in directly to things they compile from source or LD_PRELOADing it for things that come from elsewhere, they gain the ability to bypass journald.

This is not something I'm happy about (it's far too hackish, and the fact that it's overriding what the program is compiled to do is the right thing in most cases, but not all, see my rant earlier about not deciding the 'one true way' :-)

If systemd did gain the ability to pass /dev/log directly to programs, I'm sure that we would add the ability to accept the filehandle very quickly (one question though, how many filehandles are we talking about? is it a single filehandle globally, or one per service? Since journald talks about being able to aggregate all logs related to a service, no matter how they are written it's now a question)

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 22:31 UTC (Fri) by cortana (subscriber, #24596) [Link] (2 responses)

Hm, have you considered running the affected service in its own mount namespace, within which you bind mount /run/system/journal/syslog over /dev/log? Seems cleaner than LD_PRELOAD games--assuming that bind mounting sockets works of course, I've not actually tried it myself.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 23:22 UTC (Fri) by dlang (guest, #313) [Link] (1 responses)

how would you do this in systemd?

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 0:52 UTC (Sat) by cortana (subscriber, #24596) [Link]

Today, maybe with: ExecStart=/usr/bin/unshare -m /bin/sh -- -c 'mount --bind /run/system/journal/syslog /dev/log && myprogram ...'

I'd like to see systemd get an option to configure a private mount namespace, then it could become a bit less ugly:

[Service]
PrivateMountNamespace=foo
ExecStartPre=/bin/mount --bind /runs/system/journal/syslog /dev/log
ExecStart=myprogram ...

Where the name of the namespace is used to create a file, /run/mountns/foo, which can be used by processes from other namespaces to obtain a file descriptor that they can pass to setns() in order to enter the namespace. That way, several services can share a common mount namespace. 'ip netns' provides a similar facility for managing network namespaces.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 6:32 UTC (Sat) by fandingo (guest, #67019) [Link] (16 responses)

I've been frustrated by this same issue, but at a smaller scale, for some time.

I don't claim any insight into the direction of journald development, but it seems quite incomplete. There is the systemd-journald-gateway (package name on Fedora at least). It's such a half-assed network implementation though. First, it's pull only, and second, not that useful at any sort of scale.

I guess what I find is most surprising is that http://lists.freedesktop.org/archives/systemd-devel/2012-... doesn't seem to have gone anywhere that I can find. Ultimately, the journal needs some way to *send* log messages in realtime to via some sort of journal network protocol (which this patch provided in 2012). Then, you can at least get a centralized journal instance. After that, Logstash, Splunk, and the other log parsing vendors can implement a journald receiver, and everything should return to harmony.

Does anyone have any idea why this never got merged or refined further? The journal is so useless without it.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 6:45 UTC (Sat) by dlang (guest, #313) [Link] (12 responses)

i suspect that once they started to hear about enterprise scale logging they decided they didn't want to go there (at least not yet) and started saying that they would just pass the logs through to traditional syslog daemons.

As it is, you can very easily use rsyslog to get the logs from the journal and send them, as long as you aren't doing that large a volume.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 8:30 UTC (Sat) by fandingo (guest, #67019) [Link] (11 responses)

But once you leave journald, you lose 3/4 of the metadata.

If a journald network sender were implemented, your complaints about scale would almost certainly disappear completely. Obviously, there would be no memory copies to send a message from journald to an external sender (like rsyslog). Additionally, it's already trivial to avoid *any* storage by specifying Storage=none.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 8:57 UTC (Sat) by dlang (guest, #313) [Link] (10 responses)

is journald really going to implement all the different network protocols?

why is it that journald refuses to pass the metadata on to a syslog daemon? that would be a lot simpler.

and even if the journal could send the logs over the network, there is still a lot that syslog daemons can do in terms of filtering that journald can't.

why is it NIH if the syslog daemons try to work around journald to keep doing their job, but it's not NIH for journald to throw away decades of experience and code to do things from scratch?

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 9:15 UTC (Sat) by mchapman (subscriber, #66589) [Link] (9 responses)

> why is it that journald refuses to pass the metadata on to a syslog daemon? that would be a lot simpler.

Isn't that precisely what Rsyslog's imjournal module is for?

If you're question is "why can't journald pass on the metadata as part of the syslog message themselves?" then you're going to have a whole lot more complexity in deciding how such metadata should be formatted, and that's probably a lot of config that has no rightful place in journald itself.

Remember, the journal can contain pretty much *arbitrary* metadata, whatever applications might decide to store there, so there's really no standard way of formatting it all. This is the 3/4 fandingo was talking about. As an example, I am personally interested in writing a mod_log_journal for Apache that provides all the kinds of stuff you can get out of mod_log_config, all nicely indexed into different journald fields -- how on Earth is journald expected to be able to provide that to Rsyslog?

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 17:07 UTC (Sat) by dlang (guest, #313) [Link] (6 responses)

yes, that is what imjournal is doing, because the journal won't send the metadata to syslog in the passthrough mode.

the applications are sending their data to the journal via /dev/log and syslog calls, just exactly the same way that they would send it to syslog if the journal didn't exist (plus capturing stdout and stderr which are usually unformatted garbage)

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 0:29 UTC (Sun) by mchapman (subscriber, #66589) [Link] (5 responses)

> yes, that is what imjournal is doing, because the journal won't send the metadata to syslog in the passthrough mode.

I'm not sure what you mean by "passthrough mode" here. The metadata was never inline with the messages in the first place, even before it got into journald.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 22:35 UTC (Sun) by dlang (guest, #313) [Link] (4 responses)

without systemd the syslog daemon could gather this data itself (the same way that journald does), in the mode where journald sends logs to the syslog /dev/log, this data is not all still available (some gets forged to be available, which costs additional processing time, other data is just not there)

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 0:00 UTC (Mon) by mchapman (subscriber, #66589) [Link] (3 responses)

> without systemd the syslog daemon could gather this data itself (the same way that journald does), in the mode where journald sends logs to the syslog /dev/log, this data is not all still available (some gets forged to be available, which costs additional processing time, other data is just not there)

I think we might be talking about different things.

I am interested in providing structured messages -- multiple fields containing all sorts of useful metadata -- from applications themselves. I am not too concerned about the "automatic" metadata journald adds; it's clear that if journald can add that, then so can any other logging system.

I note that you've linked to RFC 5424 elsewhere. The structured data extension in that does look interesting; I wonder if journald can support it (or if anybody has asked the developers or proposed it to them).

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 4:54 UTC (Mon) by dlang (guest, #313) [Link] (2 responses)

journald would just store this as text.

The RFC5424 structured data really isn't used very much, and even rsyslog (who's primary developer is the author of RFC5424) didn't have the ability to actually _use_ this data until a few months ago.

in spite of the official standard, the de-facto standard is looking like it's going to end up being JSON structured messages. All of the syslog daemons can parse JSON structured logs and act on the different variables in them (and modify them for that matter)

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 18:38 UTC (Mon) by rodgerd (guest, #58896) [Link] (1 responses)

That's good news, because while I appreciate journald forcing the issue of structured data, I don't appreciate the binary on-disk format.

Why it seems to have taken systemd to light a fire under the arses of syslog developers is another question entirely...

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 18:55 UTC (Mon) by dlang (guest, #313) [Link]

> Why it seems to have taken systemd to light a fire under the arses of syslog developers is another question entirely...

It didn't, the syslog developers had already been working on these things (and many others)

RFC5424 and the CEE standardization projects came out several years before the systemd journal.

The only features that the systemd journal had (or at least claimed to have) that syslog didn't have at release were

capturing STDOUT and STDERR (which I'm not sure is really that good an idea, as sloppy as many developers are with these)

trusted properties (some of which are racy to gather, so not as reliable as you may think), which had been discussed and was on the TODO list, but had not made it to the top

log 'security', which as many noted, the journald security did not protect you against malicious root on your system unless you sent data elsewhere, and sending data elsewhere is usually the best defence anyway. Syslog developers have implemented better anti-tampering features that specifically document what external connectivity is needed to make things secure.

It doesn't help that distros tend to ship such old versions of the syslog daemons. It is hard to see progress when the version that's included is several years old.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 18:15 UTC (Sat) by raven667 (subscriber, #5198) [Link] (1 responses)

The current syslog RFC includes a way to format structured data which might be useful as a bridge between the journal and traditional syslog.

http://tools.ietf.org/html/rfc5424#section-6.3

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 22:20 UTC (Sun) by dlang (guest, #313) [Link]

In addition to this, every syslog implementation supports dealing with JSON logs and being able to decode them, see the CEE log standardization project and lumberjack

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 13:09 UTC (Sat) by zuki (subscriber, #41808) [Link] (2 responses)

> I guess what I find is most surprising is that http://lists.freedesktop.org/archives/systemd-devel/2012-...
> doesn't seem to have gone anywhere that I can find.
> Does anyone have any idea why this never got merged or refined further?

It got put on a back-burner :) The reason was simply that due to lack of time, I had to give other things higher priority. Actually voices requesting such remote logging functionality have been almost non-existent, so I simply thought that nobody really cares about that. I've started working on this again, so hopefully there'll be an update soon.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 17:53 UTC (Sat) by fandingo (guest, #67019) [Link] (1 responses)

> It got put on a back-burner :) The reason was simply that due to lack of time[...]

I certainly understand that.

> Actually voices requesting such remote logging functionality have been almost non-existent

Honestly, this surprises me since some of the systemd developers work at Red Hat and even your use-case at GMU (apologies for stalking you on the Fedora wiki). In your experience, how are people using the journal on multiple systems? Are they using Journal->Rsyslog for aggregation, not aggregating their logs, or using a totally separate aggregator like Logstash/Splunk? I hope that you get the time to work on it because it's a really exciting feature.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 3:35 UTC (Sun) by zuki (subscriber, #41808) [Link]

> In your experience, how are people using the journal on multiple systems?
> Are they using Journal->Rsyslog for aggregation, not aggregating their
> logs, or using a totally separate aggregator like Logstash/Splunk?
I know that some people do use journald+rsync or journald-on-nfs. Many use rsyslog to forward messages. Journald could definitely use some care. In addition to the missing push-over-the-network functionality, it is currently fairly slow in collecting its metadata, has other performance issues. kdbus should solve the issues with metadata collections, and probably some smart profiling will be needed to find the reason for other slowness.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 11:00 UTC (Fri) by anselm (subscriber, #2796) [Link]

OTOH, if you hit the limits of SysV init (which does tend to happen rather earlier than with systemd) you get to fight SysV init, usually by writing large swathes of shell code that may or may not work reliably on the machine in question and are almost guaranteed not to work at all on the next machine over that uses a different Linux distribution.

The systemd developers are opposed to porting systemd to other kernels but so far they don't seem categorically opposed to suggestions about reasonable use cases on Linux that systemd does not yet handle well, or to bug fixes. Contrast this to Upstart, which has all sorts of serious bugs that nobody seems to be eager to fix, or to SysV init, which is buggy by design.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 23:35 UTC (Sat) by cdmiller (guest, #2813) [Link] (7 responses)

1. So you run 'service sshd start' somewhere. No error. Only then you realize that your coworker uninstalled sshd package. Your script just returned 0==success after failing to do the requested job.

Others use Puppet or similar making sure the package is always installed, configured properly, and running, else copious error logs and notifications ensue.

2. You run your script. ssh config file has a fatal error. Result the same as above.

Others use Puppet to manage the configs under version control making sure the latest tested working copy is always in place.

3. Let's say that you want to connect to the ssh daemon from a second script. You start the script, only to learn that sshd hasn't actually started yet. You stick a 'sleep 1' in the other script. Everything works, on the faster machines, fails on the slower ones.

"You" may do something dumb to try and solve the problem, others hopefully do it better.

4. sshd is unexpectedly killed by oom killer. You run 'service sshd start; service sshd stop' in quick succesion. Ooops, the start script returned before sshd manager to write its pidfile. The stop script kills a random program which got the same pid meanwhile.

Sucks to be "you". Others increase the RAM for their VM and troubleshoot what's triggering the OOM killer.

5. sshd has hung. You run the stop script. The script send TERM, but the program is seriously stuck and will only react to KILL. Nevertheless, you are notified about success.

Apparently troubleshooting this script's behaviour is beyond "you". Others track down and fix the problem causing sshd to hang.

Yay for competence, for which there is no software substitute.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 1:10 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

So…your solution to make shell code viable is to set up, admister, and maintain things through a Ruby system? Puppet is *way* overkill for less than, say, 12 machines. In that case, IMO, systemd is an obvious winner there.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 1:50 UTC (Sun) by vonbrand (subscriber, #4458) [Link]

I for one much prefer not to get into those situations to be "fixed" with yet another thick layer of software to learn, set up, and generally manage, all to work around the legacy mess. Next claim is that all this is "simple, streamlined," while systemd is "bloated."

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 3:51 UTC (Sun) by zuki (subscriber, #41808) [Link]

I was trying to show rather fundamental ways in the sysv example script can return false or misleading results or do something completely unexpected. Of course each and every one of those has a workaround. Knowing all those hacks is competence in a sense. It is also time wasted which could actually be put to better use than fighting around deficiencies in the basic building blocks of the OS. I don't want to "troubleshoot" anything, if the service manager already has the necessary information. I don't want to check if the daemon I started is actually running. I think we can agree that having a system which handles OOM conditions reasonably is better than always throwing "enough" RAM at the problem.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 10:40 UTC (Mon) by HelloWorld (guest, #56129) [Link] (1 responses)

This culture of defending tooth and nail even the most broken tools and approaches is *exactly* what is wrong with “Unix culture”.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 18:35 UTC (Mon) by rodgerd (guest, #58896) [Link]

It's a general techie problem. Always quick to sneer about other peoples' buggy whips, always whining about change and being over-invested in the value of having learned how to fix something that should never have broken in the first place, like some sort of bizarre Stockholm Syndrome.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 18:34 UTC (Mon) by rodgerd (guest, #58896) [Link]

> Yay for competence, for which there is no software substitute.

I quite agree. I'm not sure you've thought what I imagine you think is a devastating riposte through carefully enouhg, because if you think the right answer to your broken hacky scripts is Puppet, or the answer to the OOM killer is always memory, I can tell you it's not the parent poster who looks bad.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 18:50 UTC (Mon) by cortana (subscriber, #24596) [Link]

If you're going to quote other posts, please indent them with a > symbol so that the result becomes unreadable. This is essential if you are going to intersperse points inline with the original post.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 16:53 UTC (Fri) by raven667 (subscriber, #5198) [Link] (6 responses)

The systemd config isn't that long, there are several direct features that your naive script doesn't handle, there is a reason that all the complexity of the distro scripts have evolved over the years, you call them an abomination, which is true, but every line was added to solve a real need.

- your script might start sshd before the system is ready to run it so it isn't listening on the right interfaces, startup log messages could be lost, etc.
- any messages sent to stderr/stdout can be lost with your script
- your script will appear to be successful regardless of whether the service actually started or not
- your script doesn't include a scheme to automate changes to the command line or environment except by hand-editing the script, its easier to manage an included file with standard unix tools like cp and mv than hand-editing scripts
- your script doesn't check if the host keys have been created before trying to start sshd and doesn't create them if they are missing
- your script doesn't include a reload, restart or status target
- your script doesn't monitor and restart the service if it should fail
- your script requires the daemon to include logic for writing out PID files, forking away from the controlling terminal. this is part of the complexity of starting a service and every daemon has to re-implement that code
- as someone else pointed out, there is no guarantee that a PID file actually matches a running service, so if the daemon dies or is killed then your script will try to kill some random unrelated process
- your script doesn't sanitize its environment so variables can leak from the shell of the user running it into the daemons environment, or the daemon has to carefully sanitize the environment itself, additional complexity that has to exist somewhere.
- the systemd units will start the daemon on first access, so it doesn't need to run until you want to use it, a minor efficiency gain.

more /usr/lib/systemd/system/sshd.*
::::::::::::::
/usr/lib/systemd/system/sshd.service
::::::::::::::
[Unit]
Description=OpenSSH server daemon
After=syslog.target network.target auditd.service

[Service]
EnvironmentFile=/etc/sysconfig/sshd
ExecStartPre=/usr/sbin/sshd-keygen
ExecStart=/usr/sbin/sshd -D $OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target
::::::::::::::
/usr/lib/systemd/system/sshd.socket
::::::::::::::
[Unit]
Description=OpenSSH Server Socket
Conflicts=sshd.service

[Socket]
ListenStream=22
Accept=yes

[Install]
WantedBy=sockets.target

From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 22:59 UTC (Sat) by cdmiller (guest, #2813) [Link] (5 responses)

The "naive" example is a "piece" of a sysv init script, duh. The full script is in the "UNIX and Linux System Administration Handbook (4th Edition)" by Nemeth et al.. All here have access to a copy if they want to see the entire script in it's original glory.

Yours and others critiques of the script's style seem misplaced. The code segment demonstrates a simple case for a startup and shutdown script. For sshd and similar, start, stop, reload, restart are all that is wanted. Sometimes refinements are too abstract or horrible abominations. Other times refinements are easily followed. Abstraction can only hide complexity and if complexity is buried too deep troubleshooting is more difficult or impossible.

To restate the original question, how does the presented type of sysv init script's maintainability and troubleshooting on a server compare vs the source code of Systemd or Upstart? Probably that's not quite fair, the source code of sysv init should be included. Perhaps the OpenRC code should be thrown in as well.

To beat long dead horses, speaking to most of the items you and others cite having systemd handle, many other solutions exist and are used daily with a startling (for many apparently) level of success. Package managers, OS configuration management systems, system logs, inetds, daemons writing PID's, init scripts which "carefully sanitize" their env, various watchdog and ha tools, cats and dogs living together, etc. all exist and are usable when needed. Each includes a level of complexity and abstraction. Are those levels higher or lower than an all-in-one solution? The expectation that an an all-in one solution will be better than every one of a set of more focused specialized solutions at meeting disparate lists of real needs seems far fetched. There was already a long discussion on how logging needs are not met by a current all-in-one approach, but are already met by multiple specialized solutions.

Another question, what is the impact of a critical bug in an init script versus a critical bug in the code of a do-it-all init, package management, OS config management, inetd, watchdog, make the pid file, sanitize the env, feed the dogs and cats together, and system logging solution?

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 1:31 UTC (Sun) by vonbrand (subscriber, #4458) [Link]

I very much prefer to trust/audit one implementation of the complex dance to sanitize the environment, drop privileges, detach from the terminal and the rest than having to trust a few hundred service providers to do it right, all of them (one screwing up is enough for a security breach).

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 5:30 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

>Another question, what is the impact of a critical bug in an init script versus a critical bug in the code of a do-it-all init, package management, OS config management, inetd, watchdog, make the pid file, sanitize the env, feed the dogs and cats together, and system logging solution?

My own story - a bug in BIND initscript forced me to go to our datacenter at 3am to restart a server stuck in a reboot. You see, BIND had a bug in DNSSEC implementation which caused it to hang indefinitely after SIGSTOP.

And its initscript had (and still has) these wonderful lines:
>if [ -n $pid ]; then
> while kill -0 $pid 2>/dev/null; do
> log_progress_msg "waiting for pid $pid to die"
> sleep 1
> done
>fi

So the reboot sequence can be stalled forever by a SINGLE F___NG INITSCRIPT. And _of_ _course_ this happens after the server becomes inaccessible over the network.

I wanted to propose the reintroduction of the death penalty for anyone advocating hacky unreliable solutions for basic functionality, but my colleagues disagree. They think that 30 years in a high security prison should be enough for that crime.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 6, 2014 15:23 UTC (Thu) by cdmiller (guest, #2813) [Link] (2 responses)

A critical bug appearing inside a do-it-all replacement binary is somehow better than your described BIND init script bug?

From anti-systemd to pro-systemd in the shortest time

Posted Feb 6, 2014 20:36 UTC (Thu) by Wol (subscriber, #4433) [Link]

You miss the point.

A SINGLE bug in systemd - *especially* a critical one - will be found and fixed very quickly.

There are *thousands* of places a bug like that in the BIND script could hide, to pop up at the most inconvenient moment. Each one needing to be individually splatted.

And if upstream are impervious to reports/fixes (note the OP said this code is still there...) then it needs to be re-splatted every time you do an upgrade or re-install.

What's better - one bug that's easily fixed once found, or hundreds of bugs that you might or might not stumble across?

Cheers,
Wol

From anti-systemd to pro-systemd in the shortest time

Posted Feb 6, 2014 21:13 UTC (Thu) by anselm (subscriber, #2796) [Link]

Note that a critical bug in systemd's PID 1 will probably be found and removed very quickly, whereas a critical bug in some init script may not even be encountered by most users, may be difficult to reproduce and locate in the code and even more difficult to fix given the circumstances under which init scripts are generally run.

Contrary to popular belief systemd's PID 1 is not a very large program, and it is common to all systemd-using distributions, with a sizeable and active team of developers behind it, while an init script is typically specific to one single distribution and may have just one single maintainer who may have more important things to do with their time than chase other people's obscure bugs. Hence bugs in systemd are much more likely to be fixed than bugs in init scripts for services like BIND, which only even run on a small percentage of machines in the first place.

To quote Mark Twain: Behold the fool saith, »Put not all thine eggs in the one basket« — which is but a manner of saying, »Scatter your money and your attention«; but the wise man saith, »Put all your eggs in the one basket and — watch that basket!«

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 13:10 UTC (Fri) by HelloWorld (guest, #56129) [Link] (1 responses)

That's actually interesting as Lennart praised upstart for its clean source code.

From anti-systemd to pro-systemd in the shortest time

Posted Jan 31, 2014 19:55 UTC (Fri) by hummassa (subscriber, #307) [Link]

I am not bashing upstart's source code. I am saying that reading both inits' source code convinced me that, for linux at least, systemd is more adequate. And that implementing the things that upstart lacks properly would be on par to rewriting systemd.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 17:09 UTC (Thu) by raven667 (subscriber, #5198) [Link] (3 responses)

It's still a democratic vote and Canonical is a large partner with Debian so their opinion matters and should be taken into account. It might not pick the "right" answer and the technically superior option but it's still *fair*.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 18:20 UTC (Thu) by anselm (subscriber, #2796) [Link] (2 responses)

The tech-ctte is not a democratically elected body, and its composition is not intended to reflect the composition of the Debian community. On the other hand, it is the technical committee and, as such, ought to arrive at its resolutions by looking at technical, rather than political, issues – which is why the people on the committee are picked because they supposedly have a deep understanding of the technical side of Debian and can be relied upon to make impartial decisions on that basis.

Considering this, it is very aggravating if the position of someone on the tech-ctte looks as if it hinges not on technical points, but on the agenda of whoever signs their paycheck. The Upstart side of the debate has so far been fairly light on technical detail but heavy on mud-slinging and obfuscation, which is disgraceful and an insult to those people on the committee who actually went and performed their own research without a preconceived notion of what the result ought to be.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 20:22 UTC (Thu) by HelloWorld (guest, #56129) [Link] (1 responses)

> it is the technical committee and, as such, ought to arrive at its resolutions by looking at technical, rather than political, issues – which is why the people on the committee are picked because they supposedly have a deep understanding of the technical side of Debian and can be relied upon to make impartial decisions on that basis.
That seems a bit naive. Non-trivial technical decisions are about tradeoffs. Systemd trades portability for features, performance, robustness, integration. Which of these should be preferred isn't a technical question, it depends on the goals of the users and the project's developers, and those are hard to determine. That's not a technical question any longer, that's a social-scientific question. I think I know the answer, but others seem to have more doubts.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 20:47 UTC (Thu) by anselm (subscriber, #2796) [Link]

[…] it depends on the goals of the users and the project's developers, and those are hard to determine.

Which raises the question of why one would ask the technical committee to make the decision in the first place. The tech-ctte's job is to arbitrate technical disagreements between developers; it isn't the tech-ctte's job to gauge »users' and developers' goals« in order to prescribe project policy.

Since we're very probably going to have a GR anyway, we can only hope that the Debian developers will at least look at the tech-ctte's deliberations before voting. This should acquaint them not only with, e.g., Russ Allbery's very detailed write-up explaining why, based on his own unprejudiced comparison, he prefers systemd to Upstart, but also with the discussion tactics of the Canonical camp.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 11:35 UTC (Thu) by jonnor (guest, #76768) [Link]

If the aim is to reach "consensus", not just getting ones way, one should be careful with strategic voting.

A TC cannot decide on a political issue

Posted Jan 30, 2014 11:47 UTC (Thu) by avheimburg (guest, #75272) [Link] (6 responses)

To me, this looks like a political decision and not a technical one. Thus, it is very hard if not impossible to decide using technical criteria.

Is Debian a GNU distribution whose Linux variant happens to have the biggest install base, or id Debian a Linux distribution with some ports? Who is the target audience of the distribution? For what usage scenarios does Debian want to optimize? That is the discussion that should be happening.

Once the political stage is set, the technical decision becomes much easier.

A TC cannot decide on a political issue

Posted Jan 30, 2014 14:01 UTC (Thu) by anselm (subscriber, #2796) [Link] (3 responses)

According to Debian's Social Contract, its priorities are »its users and free software«. If more than 99.9% of these users are using the Linux kernel, it stands to reason that Debian ought to provide the best experience to this vast majority. This effectively implies making systemd the default init system.

A TC cannot decide on a political issue

Posted Jan 30, 2014 15:00 UTC (Thu) by avheimburg (guest, #75272) [Link] (2 responses)

That's what are reading into it. It never says so explicitly. And whether your reading of the Social Contract is the correct reading or not is the kind of political issue that has to be answered.

As you point out, when that has been answered, the technical decision is not hard.

A TC cannot decide on a political issue

Posted Jan 30, 2014 15:22 UTC (Thu) by anselm (subscriber, #2796) [Link] (1 responses)

What part of §4 of the Social Contract do you find unclear?

A TC cannot decide on a political issue

Posted Jan 30, 2014 16:44 UTC (Thu) by jhoblitt (subscriber, #77733) [Link]

The part where it's missing a section requiring community members with obvious conflicts of interest to recuse themselves from TC votes...

A TC cannot decide on a political issue

Posted Jan 30, 2014 15:09 UTC (Thu) by cortana (subscriber, #24596) [Link]

IMO sending the broader political decision ("I'd like to ask the tech-ctte to
please vote on and decide on the default init system for Debian.") to the ctte was a bad move. If the question had been, 'which init system should Debian GNU/Linux use' then the question is purely a technical one.

A TC cannot decide on a political issue

Posted Jan 30, 2014 17:16 UTC (Thu) by raven667 (subscriber, #5198) [Link]

You are right in that it is now a political decision *because* the technical criteria and politics do not align. If Upstart was the technically superior system then this would be an easy decision and already made, the reason this is contentious is because everyone knows Upstart is not the best, even the Upstart developers, or they would be able to engage based on technical criteria rather than politics.

The question is whether Debian can make independent decisions for the betterment of itself and its users or if it is just part of Ubuntu and needs to submit to whatever decisions are made for Ubuntu, which is the most popular version of Debian.

This week in "As the Technical Committee Turns"

Posted Jan 30, 2014 16:43 UTC (Thu) by fandingo (guest, #67019) [Link]

It seems to me that they could at least move the debate forward by eliminating at least one option. At the very least, that will make it harder for Ian to game the preferential voting system.

Get OpenRC out of the debate. It had no business being there in the first place.

new ballot proposed

Posted Feb 1, 2014 18:29 UTC (Sat) by HelloWorld (guest, #56129) [Link] (6 responses)

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=727708#4827
It seems like they want to vote on this on Monday.

new ballot proposed

Posted Feb 1, 2014 18:42 UTC (Sat) by vonbrand (subscriber, #4458) [Link] (2 responses)

Just great. Propose a vote with options that (if selected) would delay Jessie for a few years (systemd or upstart with no alternative, mandate OpenRC).

new ballot proposed

Posted Feb 1, 2014 20:16 UTC (Sat) by HelloWorld (guest, #56129) [Link] (1 responses)

Huh? Nobody said anything about a “no alternative” option. In fact there seems to be a consensus that sysvinit must be supported in jessie at least to allow for smooth upgrades.

new ballot proposed

Posted Feb 1, 2014 20:34 UTC (Sat) by raven667 (subscriber, #5198) [Link]

sysvinit scripts are well supported by either Upstart or systemd so there is no flag day that says all software must drop their sysvinit scripts to be supported by the modern init. It is also a fact that many systemd unit files already exist, because daemons are already packaged for one of the other distros, and in many cases the unit files are maintained by the upstream project.

new ballot proposed

Posted Feb 4, 2014 18:47 UTC (Tue) by kugel (subscriber, #70540) [Link] (2 responses)

Vote didn't happen yet, unfortunately.

new ballot proposed

Posted Feb 5, 2014 10:01 UTC (Wed) by HelloWorld (guest, #56129) [Link] (1 responses)

It was postponed to today (as can be seen from the messages further down)

new ballot proposed

Posted Feb 5, 2014 19:04 UTC (Wed) by foom (subscriber, #14868) [Link]

Ahahaha, vote called......And...cue "but but you didn't WAIT LONG ENOUGH YET" email...from a new actor, this time.

https://lists.debian.org/debian-ctte/2014/02/msg00095.html

This week in "As the Technical Committee Turns"

Posted Feb 4, 2014 10:49 UTC (Tue) by lainfinity (guest, #94972) [Link] (10 responses)

"Programming in the real world is all about managing complexity. Tools to manage complexity are good things. But when the effect of those tools is to proliferate complexity rather than to control it, we would be better off throwing them away and starting from zero."

The Art of Unix Programming
Eric Steven Raymond

Please don't forget this important part of the Unix wisdom when you make your decision.

This week in "As the Technical Committee Turns"

Posted Feb 4, 2014 12:25 UTC (Tue) by anselm (subscriber, #2796) [Link] (9 responses)

Yes. Isn't it a great idea to start from zero – to get rid of the complexity, to finally replace the current baroque hodgepodge of System-V init, inetd, cron, loads and loads of distribution-specific early-boot tools and init scripts, assorted home-grown service trackers, all of which are configured differently from one another and don't communicate at all well, with one single comprehensive, modular system which uses one unified declarative configuration system, allows upstream packages to provide service configuration files instead of forcing distributions to make their own, and comes with extensive documentation, defined interfaces and a compatibility promise? Much like the designers of Unix replaced the sprawling and device-specific I/O APIs of early-1970s operating systems with a groundbreaking set of very simple operations (as in »everything is a file«). It is the »Unix philosophy« at its best.

Starting from zero as far as the init system is concerned is a step that most of the other popular Unix systems have already taken long ago, and that was long overdue for Linux. We're now in the fortunate position that somebody actually volunteered to take on this task, and it would be silly for Linux users not to want to reap the benefits.

This week in "As the Technical Committee Turns"

Posted Feb 4, 2014 16:51 UTC (Tue) by lainfinity (guest, #94972) [Link] (8 responses)

The complexity I AM addressing is Optional Complexity and not Essential Complexity.

The Optional Complexity should not be forced on everyone and every system and made mandatory immaterial of its benefits. It should be optional.

It is an abomination and violation of the principles of Open Source Movement, Freedom Software Philosophy, Hacker Culture and Unix Philosophy to enforce Optional Complexity.

Complexity

OpenRC (0.9.3): sysvinit + 300 files, ~30k lines, 3.3k posix sh, ~12k C
Upstart (1.5): 285 files, ~185k lines, ~97k C
Debian: sysvinit + 120 files, 5.8k lines
systemd (v44+): dbus + glib + 900 files, 224k lines, 125k C
sysvinit: 560kB, 75 files, ~15k lines
D-Bus: 11MB, ~500 files. 300k lines, 120k C
glib: 72MB, ~2500 files, ~1.7M lines, ~430k C

Debian startup is smallest, it's only shell with sysvinit (C) as dependency
Upstart is about 10 times bigger in terms of lines of code/text. Most of the extra complexity size comes from C.
OpenRC is about twice as big as debian startup. The size difference is mostly the OpenRC core written in C, which expands the footprint from ~3k LoC to ~15k LoC compared to shell.
systemd is about 10 times bigger, like upstart. But with the mandatory deps it blows up to about one hundred times the code footprint! Most of the extra code is in mandatory dependencies, but the systemd core is also bigger than anything else.

http://wiki.gentoo.org/wiki/Talk:Comparison_of_init_systems

This week in "As the Technical Committee Turns"

Posted Feb 4, 2014 17:33 UTC (Tue) by vonbrand (subscriber, #4458) [Link] (3 responses)

That is a nonsense comparison. Compare LOC for the same functionality (can't do, all non-systemd fall far short even with all sort of add-ons). You are leaving out the massive chunk of "libraries" of shell code, and all the programs called (ps, grep, sed, ...)

This week in "As the Technical Committee Turns"

Posted Feb 4, 2014 23:29 UTC (Tue) by lainfinity (guest, #94972) [Link] (2 responses)

Cyclomatic Complexity and Lines of Code: Empirical Evidence of a Stable Linear Relationship.

http://www.scirp.org/journal/PaperInformation.aspx?paperI...

This week in "As the Technical Committee Turns"

Posted Feb 4, 2014 23:52 UTC (Tue) by anselm (subscriber, #2796) [Link]

So? As has been mentioned before, for the SysV init/inetd/… conglomerate you need to add the LOC for the shell and every single external program that is used in your boot scripts, init scripts, etc. Suddenly systemd, which does not even require a shell to boot the system, no longer looks that big.

Also, cyclomatic complexity has nothing to do with the »complexity« of a software system that has no unity of design, no well-defined interfaces, no common configuration file format, hardly any portability between Linux distributions, and little documentation beyond a few individual man pages, i.e., SysV init and friends. The cognitive burden on system administrators who need to be able to configure, maintain, and debug this system, as well as new users who need to learn to deal with it, and instructors who need to teach it to new users (not to mention upstream and distribution developers who have to deal with lots of not-quite-interchangeable ways to do the same thing), is considerable. On the other hand, a modern approach like systemd, which greatly decreases the number of separate abstractions in use and serves as a unifying element between formerly disparate Linux distributions, has the potential to reduce this burden to a large degree – which benefits system administrators, developers, distribution providers, and instructors alike.

This week in "As the Technical Committee Turns"

Posted Feb 5, 2014 1:38 UTC (Wed) by vonbrand (subscriber, #4458) [Link]

Cyclomatic complexity is essentially the number of branches in the code. Small wonder that a roughly constant fraction of statements are branches...

And that is relevant in this discussion why?

This week in "As the Technical Committee Turns"

Posted Feb 5, 2014 17:20 UTC (Wed) by lsl (subscriber, #86508) [Link] (3 responses)

This "analysis" is so totally flawed that it's not even funny.

Remarkably, the ~900 files count roughly corresponds to the number of files in the whole src/ subtree of the systemd git repo. That includes udev. And multiple libraries for talking to udev. You also include a whole slew of command programs for various purposes like a top(1) equivalent for cgroups or a program for generating bootchart graphs in SVG format.

Those things are all nice and stuff but they don't have anything to do with PID 1. You'll probably run udev even when going with another /sbin/init.

Now what about glib? It's not loaded into my systemd process. So where does it come from? Apparently from gudev which is a convenience library(!) for talking to udev.

Then there's the whole of logind. And more.

Now, you don't have any clue what you're talking about, right? The other possibility would be worse.

This week in "As the Technical Committee Turns"

Posted Feb 5, 2014 18:13 UTC (Wed) by paulj (subscriber, #341) [Link] (1 responses)

So, just a little observation.

While your comment seems technically very informative, the "You don't have any clue ..." comment at the end possibly is not terribly productive. It's personal and it's the kind of thing that can make discussions more heated than they need to be. If the other person's understanding is deficient, it is more than enough to demonstrate that with technical points (and the audience will hold that to your credit). :)

This week in "As the Technical Committee Turns"

Posted Feb 5, 2014 18:56 UTC (Wed) by lsl (subscriber, #86508) [Link]

Yes, you're right. It was unnecessary. I added a comment.

Thanks for noting it.

This week in "As the Technical Committee Turns"

Posted Feb 5, 2014 18:55 UTC (Wed) by lsl (subscriber, #86508) [Link]

Scrap the last part. This was unwarranted. Sorry for that, lainfinity.