Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Posted Sep 2, 2014 20:20 UTC (Tue) by NightMonkey (subscriber, #23051)In reply to: Poettering: Revisiting how we put together Linux systems by jackb
Parent article: Poettering: Revisiting how we put together Linux systems
Those are behaviors that shouldn't be in the init system, if you like UNIX philosophical models of "do one thing and one thing well." These complicate an already complex job, better done by task-specific narrowly-scoped tools. Monit, Puppet, Chef, watchdog, and many other programs can do that simply defined task and do it well. And fix those crashing daemons! Crashing should never become accepted, routine program behavior! :)
For the console output, can't syslong-ng (or rsyslog, or similar) do that?
Posted Sep 2, 2014 20:51 UTC (Tue)
by dlang (guest, #313)
[Link] (6 responses)
There's nothing preventing you from running the output of any program into logger (or equivalent) to have that data sent to syslog-ng or rsyslog ( 2|logger -t appname.err |logger -t appname.stdout or something similar)
Posted Sep 4, 2014 15:20 UTC (Thu)
by xslogic (guest, #97478)
[Link] (5 responses)
Posted Sep 4, 2014 15:51 UTC (Thu)
by raven667 (subscriber, #5198)
[Link] (4 responses)
Posted Sep 4, 2014 22:10 UTC (Thu)
by Darkmere (subscriber, #53695)
[Link] (3 responses)
Posted Sep 4, 2014 22:36 UTC (Thu)
by dlang (guest, #313)
[Link] (2 responses)
why do so few people see the problems with this sort of thing?
Posted Sep 4, 2014 23:25 UTC (Thu)
by anselm (subscriber, #2796)
[Link]
The non-forking daemon approach as recommended for systemd is what basically every init system other than System-V init prefers (check out Upstart or, e.g., the s6 system mentioned here earlier). It allows the init system to notice when the daemon quits because it will receive a SIGCHLD, and the init system can then take appropriate steps like restart the daemon in question. In addition, it makes it reasonably easy to stop the daemon if that is necessary, because the init process always knows the daemon's PID (systemd uses cgroups to make this work even if the daemon forks other processes).
The »double-forking« strategy is needed with System-V init so that daemon processes will be adopted by init (PID 1). The problem with this is that in this case the init process does receive the notification if the daemon exits but has no idea what to do with it. The init process also has no idea which daemons are running on the system in the first place and where to find them, which is why many »proper Linux daemons« need to write their PID to a file just so the init script has a fighting chance of being able to perform a »stop« – but this is completely non-standardised, requires custom handling in every daemon's init script, and has a certain (if low) probability of being wrong.
In general it is a good idea to push this sort of thing down into the infrastructure rather than to force every daemon author to write it out themselves. That way we can be reasonably sure that it actually works consistently across different services and is well-debugged and well-documented. That this hasn't happened earlier is only too bad but that is not a reason not to start doing it now. People who would like to run their code on System-V init are free to include the required song and dance as an option, but few if any systems other than Linux actually use System-V init these days – and chances are that the simple style of daemon that suits systemd better is also more appropriate for the various init replacements that most other Unix-like systems have adopted in the meantime.
Posted Sep 5, 2014 8:34 UTC (Fri)
by cortana (subscriber, #24596)
[Link]
Posted Sep 2, 2014 22:47 UTC (Tue)
by jackb (guest, #41909)
[Link]
That's the kind of philosophy that's useless to me. I have a lot of work to get done. I don't have time to fix all the broken daemons in the world. I welcome tools that help me get my work done and reject tools that get in my way .
Unfortunately systemd is complicated because it contains a mixture of both so I'm always ambivalent.
Posted Sep 3, 2014 10:26 UTC (Wed)
by cortana (subscriber, #24596)
[Link] (13 responses)
If they do it by running '/etc/init.d/foo status' then, no, they can't.
Posted Sep 3, 2014 15:26 UTC (Wed)
by NightMonkey (subscriber, #23051)
[Link] (12 responses)
Posted Sep 3, 2014 15:46 UTC (Wed)
by raven667 (subscriber, #5198)
[Link] (11 responses)
Posted Sep 3, 2014 16:09 UTC (Wed)
by NightMonkey (subscriber, #23051)
[Link] (10 responses)
More work is needed to make the relationship between users and developers LESS obscured, not more. When I reported a core Apache bug to the Apache developers in 1999, I had a fix in 24 hours, and so did everyone else. Now, if instead I just relied on some system to restart Apache, that bug might have gone unnoticed and unfixed, at least for longer.
Systems like this are a band-aid. Putting more and more complex systems in as substitutes for bug fixing and more human-to-human communication are what are the problem.
Posted Sep 3, 2014 16:23 UTC (Wed)
by raven667 (subscriber, #5198)
[Link] (9 responses)
I'm not sure what you are talking about nagios check_procs for eaxmple just shells out to ps to walk /proc, it is not the parent of services and doesn't have the same kind of iron clad handle to their execution state that something like runit or daemontools or systemd has.
> Systems like this are a band-aid.
You are never going to fix every possible piece of software out in the world to never crash, the first step is to admit that such a problem is possible, then you can go about mitigating the risks, not by building fragile systems that pretend the world is perfect and fall apart as soon as something doesn't go right, especially not as some form of self-punishment to cause pain to force bug fixes.
Posted Sep 3, 2014 16:34 UTC (Wed)
by NightMonkey (subscriber, #23051)
[Link] (8 responses)
I just think a lot of this is because of decisions made to keep the binary-distro model going.
I'm not interested in fixing all the softwares. :) I am interested in getting the specific software I need, use and am paid to administer in working order. There are certainly many ways to skin that sad, sad cat. ;)
Posted Sep 4, 2014 22:05 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (7 responses)
The sysadmin did a "shutdown -r". The system (using init scripts) made the mistake of shutting the network down before it shut bind down. Bind - unable to access the network - got stuck in an infinite loop. OOPS!
The sysadmin, 3000 miles away, couldn't get to the console or the power switch, and with no network couldn't contact the machine ...
If a heavily-used program like bind can have such disasters lurking in its sysvinit scripts ...
And systemd would just have shut it down.
Cheers,
Posted Sep 4, 2014 23:02 UTC (Thu)
by NightMonkey (subscriber, #23051)
[Link] (6 responses)
I don't care how "robust" the OS is. It's just being cheap that gets your organization into these kinds of situations (and that *is* an organizational problem, not just yours as the sysadmin.)
Posted Sep 4, 2014 23:15 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
And then it breaks because of a race condition that happens in only about 10% of cases.
That sysadmin in this dramatization was me, and the offending script was: http://anonscm.debian.org/cgit/users/lamont/bind9.git/tre... near the line 92.
Of course, REAL Unix sysadmins must live in the server room, spending 100% of their time tweaking init scripts and auditing all the system code to make sure it can NEVER hang. NEVER. Also, they disable memory protection because their software doesn't need it.
Posted Sep 4, 2014 23:20 UTC (Thu)
by NightMonkey (subscriber, #23051)
[Link] (4 responses)
Again, the answer to system hangs (which are *inevitable* - this is *commodity hardware* we're talking about most of the time, not mainframes) is remote power booters. I don't like living in the DC, myself.
Posted Sep 4, 2014 23:33 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
> Again, the answer to system hangs (which are *inevitable* - this is *commodity hardware* we're talking about most of the time, not mainframes) is remote power booters.
Posted Sep 5, 2014 4:09 UTC (Fri)
by raven667 (subscriber, #5198)
[Link] (2 responses)
That sounds dangerously close to internalizing and making excuses for unreliable software rather than engineering better systems that work even in the crazy imperfect world, duct taping and RPS to the side of a machine is in addition to not a replacement for making it work right in the first place.
Posted Sep 5, 2014 4:43 UTC (Fri)
by NightMonkey (subscriber, #23051)
[Link] (1 responses)
I don't think what you are saying are actually separate tasks. All software has bugs.
Posted Sep 5, 2014 14:45 UTC (Fri)
by raven667 (subscriber, #5198)
[Link]
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Those are behaviors that shouldn't be in the init system, if you like UNIX philosophical models of "do one thing and one thing well." These complicate an already complex job, better done by task-specific narrowly-scoped tools. Monit, Puppet, Chef, watchdog, and many other programs can do that simply defined task and do it well. And fix those crashing daemons! Crashing should never become accepted, routine program behavior! :)
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Wol
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Oh, I've witnessed mainframe hangups. Remote reboot is nice, but that server was from around 2006 so it didn't have IPMI and the datacenter where it was hosted offered only manual reboots.
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems
Poettering: Revisiting how we put together Linux systems