The Debian init system general resolution returns
The Debian init system general resolution returns
Posted Oct 17, 2014 5:54 UTC (Fri) by flewellyn (subscriber, #5047)Parent article: The Debian init system general resolution returns
Posted Oct 17, 2014 6:15 UTC (Fri)
by cebewee (guest, #94775)
[Link] (45 responses)
Posted Oct 17, 2014 6:50 UTC (Fri)
by jspaleta (subscriber, #50639)
[Link] (44 responses)
There isn't even a current active bug at the heart of this...its a policy fix for integration problems that have come up and already been resolved through existing collaboration. I have some modicum of faith that that the vote will not pass... and technical issues will be resolved as they come up in the future as the have thus far been dealt with.
This GR is effectively an unfunded mandate of the worst kind, there's no man power in place to deal with the RC bugs this GR will create in the future. And moreoever I don't really think anyone knows what the secondary impacts of this GR will be if it passes. There could be some very critical stuff that doesn't work with one of the alternative toy inits already in the repo. It's not like people actively test full workloads with minit. But this GR doesn't really make a distinction between toy and full featured inits or place the value of one init over another. So if something doesn't work with minit but works with sysvinit right now.. it becomes a blocker as an unintended side effect. And that's just super silly.
Posted Oct 17, 2014 7:21 UTC (Fri)
by josh (subscriber, #17465)
[Link] (38 responses)
systemd-shim already exists, and will continue to exist as long as developers take the time to maintain it. If the shim stops drawing enough developer interest to maintain ongoing compatibility, then it will stop working naturally. This GR will not magically make such interest materialize.
Posted Oct 17, 2014 10:36 UTC (Fri)
by man_ls (guest, #15091)
[Link] (37 responses)
Posted Oct 17, 2014 13:33 UTC (Fri)
by yaap (subscriber, #71398)
[Link] (36 responses)
Seriously, there are big transitions on-going or soon to come in the Linux user space. Init/systemd discussed here, but later X/Wayland and also filesystems with btrfs (whose features may be increasingly used by other software, and become expected at some point in the future). Big transitions like this are never free and perfectly smooth, let's not kid ourselves. But at some point a majority will want to move forward, and it will happen.
It's perfectly fine not to like any such big change. Any person ok with the previous situation will suffer some instability and changes (learning/retraining has some cost) and even loose some particular feature. This person has the right to be unhappy about the change. But then it is NOT ok IMHO for such person to feel entitled to continuing support of the old platform, or even worst to coerce people in on-going support by playing tricks like this GR.
1) stick with older/stable distros for as long as the transition take, and jump only after the dust has fully settled. You will still migrate, with some cost, but minimize the impact this way;
2) actively support the old scheme to keep it alive. Either by your work or by funding people.
Here, this GR wants to force others to do what Ian wants. This is just not acceptable IMHO.
BTW: I expect the Linux world to be rocky for a few years, with all the big transitions I mentioned above. As far as I'm concerned I'll stick to Debian and I'm tracking Jessie/testing and despite the odd glitch am ok with it. I believe in the end it'll be fine. I like the idea of an init/platform configuration system that supports a dynamic environment like a modern device or server container, I like the prospect to have a secure graphic environment at last and I like cheap snapshots, send/receive and integrity. I'm sure not everything that will happen will be to my taste, but hey considering all I get for free I'm not complaining.
Posted Oct 17, 2014 21:48 UTC (Fri)
by man_ls (guest, #15091)
[Link] (1 responses)
I don't feel entitled to any of it, and I'm open to some degree of breakage since I'm using testing. But I will be very sad if decades of rock-solid stability go away, just for changing init systems -- which IMHO is not such a big advantage. No doubt other people have different priorities, but the point is: Debian has usually progressed forward slowly and smoothly, but without pause; and they should not burn devs and users alike because of a sudden urge to do some particular change. So on the surface, this GR looks like the right thing to do.
Posted Oct 18, 2014 1:18 UTC (Sat)
by misc (subscriber, #73730)
[Link]
Posted Oct 19, 2014 1:25 UTC (Sun)
by zblaxell (subscriber, #26385)
[Link] (33 responses)
In some ways they are very similar. btrfs has an in-place migration tool from ext4 that almost works, and with LVM snapshots and merging the transition can be made almost live. We can't mount two copies of the same btrfs on the same kernel because of the way btrfs assembles multi-device filesystems in the kernel using UUIDs--a limitation that firmly closes the door on several existing disaster recovery and data management procedures. I don't disagree that the systemd transition experience is similar to the upcoming btrfs transition experience. They're both awful.
Right now only fools and filesystem geeks put hard dependencies on btrfs. I am a filesystem geek, so I have dozens of btrfs filesystems. Nothing in btrfs works properly. It has been 0 months since the last commit to fix filesystem-corrupting in bugs basic Unix filesystem features like rename() and fsync(), 1 week since I last had to rebuild a btrfs filesystem from backups using mkfs and rsync because of metadata corruption that btrfsck couldn't repair, 2 days since I had to clean up data corruption on a btrfs filesystem that passes btrfs scrub and check, 1 day since I hit a kernel-crashing BUG() in the brtfs code, and 0 days since I last had to reboot a machine because of assorted hang bugs that still exist in the rename() and mkdir() system calls of 3.17.1. And all that covers just the core filesystem features that ext4 and xfs can already do with better performance. New features like snapshots and send that might be attractive to application developers work even less well (and are already implemented in an incompatible way by ZFS).
At the moment, if someone said I must use btrfs and exclude any other choices, my answer--and the answer of any sane distro maintainer--would be "hahaha no." Each day that passes makes FUSE implementations of the btrfs features (yes, they exist!) on top of any other existing filesystem seem more attractive. I don't doubt that btrfs will get better, but I do doubt whether it will get better fast enough to capture the interest of application developers who might notice that all these features work in ZFS today, and never look back.
I'm not sure why systemd--which is 1/3 the age of btrfs and whose behavior has greater impact on much more than mere storage--should be treated with less skepticism. I would (and have!) adopt btrfs before systemd, simply because I can choose to use as much btrfs--or as little--as I like, without changing distros or even rebooting.
When systemd gets to the point where it can coexist with other software with overlapping functionality--as X11 and btrfs can do--there is literally no reason to object to systemd any more. I don't see why this concept is so hard for systemd people to understand, or why they haven't bothered to fix their implementation yet.
Posted Oct 19, 2014 3:16 UTC (Sun)
by pizza (subscriber, #46)
[Link] (30 responses)
Maybe because systemd doesn't eat your data if something goes awry?
Maybe because, with systemd, they've produced something that, at the very worst, is just as reliable as what it replaces?
> I don't see why this concept is so hard for systemd people to understand, or why they haven't bothered to fix their implementation yet.
Oh, it's pretty simple -- what you consider a "bug" to be "fixed" is a deliberate, fundamental design decision that the entire project is built around.
Posted Oct 19, 2014 6:53 UTC (Sun)
by zblaxell (subscriber, #26385)
[Link] (26 responses)
systemd's own bug tracker contains a number of examples where systemd is worse than the things they are replacing. Aggregating failure modes and moving them up the process tree is objectively not better than leaving the failures distributed around the leaves.
The idea that systemd's monolithic design is fundamental to anything is absurd. systemd-shim is one counterexample. Just delete the cgroup code from systemd for another.
Linux now has a prctl(PR_SET_CHILD_SUBREAPER) so you don't need to be PID 1 to reap orphan processes. That was the single and only reason systemd ever needed any part of itself to be PID 1, and it no longer exists. Add ten lines of code to use the prctl, delete all the code that checks for PID 1 (holy crap there is a lot of it), and that bug is fixed.
Seriously, systemd people. Respect valid criticism and revisit the design. If the systemd maintainers would stop insisting that its implementation-convenience hacks are immutable design decisions then we'd all be more than willing to help systemd move forward. Right now it's pointless to try to work with the project--the biggest current problems are the ones the systemd maintainers alone refuse to solve.
Posted Oct 19, 2014 7:36 UTC (Sun)
by mchapman (subscriber, #66589)
[Link] (20 responses)
Given that this was introduced by Lennart Poettering specifically so that they could support per-user systemd instances, I'm pretty sure the systemd developers are well aware of it.
> That was the single and only reason systemd ever needed any part of itself to be PID 1, and it no longer exists. Add ten lines of code to use the prctl, delete all the code that checks for PID 1 (holy crap there is a lot of it), and that bug is fixed.
I don't think that really solves much. It certainly complicates things quite a bit.
If this non-PID-1 systemd were to crash, then its children would be reparented by the real PID 1. But even if this real PID 1 were to start up a new systemd, it wouldn't be able to reparent the children back to that. A new non-PID-1 systemd would not be able to manage processes forked from the previous non-PID-1 systemd. The real PID 1 would have no option but to go into a dumb "only wait for children" mode... which is close to what systemd does already when it receives a crash-worthy signal.
At least with systemd running as PID 1 it's protected against fatal signals for which it has no configured handler (e.g. SIGKILL).
Posted Oct 25, 2014 1:18 UTC (Sat)
by zblaxell (subscriber, #26385)
[Link] (19 responses)
I find your perception of the relationship between systemd and other people sharing a machine with it to be...disturbing.
I'm mostly concerned with the ability to run systemd under some kind of supervisor without having run a full VM or build that supervisor into systemd itself (self-supervision isn't). Since systemd does everything, different systemd instances would have to be able to do different parts of what a monolithic systemd does, but they would be isolated from each other by cgroups, namespaces, and so on, enforcing that isolation from outside.
When that supervisor wants a systemd process dead, it's a bug when that systemd process is anything but dead. All this talk about surviving fatal signals is missing the point.
Fatal signals aren't the problem I collide with. Re-exec failure is, and the only way to solve that is to not be PID 1. What happens to systemd if an upgrade fails due to a bug?
Posted Oct 25, 2014 2:31 UTC (Sat)
by viro (subscriber, #7872)
[Link] (18 responses)
More interesting class of bugs stems from the systemd propensity to spew^Wdistribute tons of information to the rest of the system. Extra complexity in a sensitive place is only a part of the problem - much worse is that if dbus-daemon can't keep pace with it, the backlog is stored in memory of PID 1 until it can be sent. Now, recall what makes PID 1 special from the OOM killer point of view...
And no, it's not a pure theory - I've run into one of the bugs in that class; a lot of umount activity going on (e.g. on shutdown with several thousand bindings present in the system) ended up with quadratic amount of dbus traffic. The damn thing kept resending the entire mount table every time it saw a change. Welcome to 8G of dirty memory held by PID 1... Basically, they were too lazy to compare the old and the new tables and send a proper delta. Sure, fixing that one hadn't been hard, but the underlying architectural deficiency is still there.
_Anything_ that convinces systemd to generate a major spew is likely to take the system down. From what I've heard they had an earlier bug of the same kind; this one - with spew consisting of network device information.
It's not so much systemd codebase fault as one of the reasons why dbus is not suitable for high-volume traffic, combined with systemd using it for potentially huge amounts of such with PID 1 as originator. And no, bringing that festering shitpile of protocol into the kernel won't make the things any better - dbus is broken by design.
Posted Oct 25, 2014 2:48 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (17 responses)
PS: no, Plumber is not in any way better.
Posted Oct 25, 2014 3:22 UTC (Sat)
by viro (subscriber, #7872)
[Link] (16 responses)
PID 1 is in extremely *bad* position for keeping track of everything and sending reliable notifications of everything. No mechanism would really help with that; it's just that dbus pretends to provide one that will cope with such demands. It can't. Neither can kdbus.
Every time somebody finds a way to trigger a huge amount of traffic originating at PID 1, it's a serious bug. And they insist on keeping track of a lot of system state in PID 1, with all kinds of traffic being sent. Asking for trouble...
Posted Oct 25, 2014 3:59 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (7 responses)
Posted Oct 25, 2014 4:47 UTC (Sat)
by viro (subscriber, #7872)
[Link] (6 responses)
And yes, _that_ particular bug had been fixed - current systemd (since March or so) doesn't produce quadratic amount of traffic in that situation. My point is that _anything_ that tricks it into hosing the bus with shitloads of traffic will cause the same kind of problem, kdbus or no kdbus.
IOW, it's something they have to watch out for, and sending a lot of stuff (a lot of kinds of stuff, even) as part of normal operation, expected by the rest of the system, is seriously asking for trouble.
Posted Oct 25, 2014 6:15 UTC (Sat)
by viro (subscriber, #7872)
[Link] (5 responses)
The point being, bugs happen; the architectural mistake there is what's making them a lot more severe. Namely, the use of dbus to send notifications of many kinds of system state changes as they are happening, with PID 1 as sender. And one *still* can trigger obscene amount of traffic there - mount tmpfs on /tmp/a, create a bunch of bindings in /tmp/a/* and then keep doing mount --move /tmp/a /tmp/b; mount --move /tmp/b /tmp/a. Nowhere near as bad as "it panics on shutdown when there's a lot of mounts", but the same "let's keep telling dbus-daemon about those changes, no matter what" easily translates into severe slowdowns and OOMs. Single syscall, done in constant time kernel-side, ends up with massive dbus spam, and no throttling. Moreover, the processes receiving those notifications can bloody well get the same information themselves, and do it cheaper...
Posted Oct 25, 2014 14:56 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Nov 2, 2014 15:27 UTC (Sun)
by nix (subscriber, #2304)
[Link] (3 responses)
The same applies to anything at all related to network interfaces.
Presumably this traffic is meant to be consumed by udev, i.e. it's a replacement for the existing in-kernel uevent messages over the netlink socket. Seems like a rather baroque, ludicrous, and bug-prone change to me.
Posted Nov 2, 2014 16:33 UTC (Sun)
by raven667 (subscriber, #5198)
[Link]
Posted Nov 2, 2014 17:08 UTC (Sun)
by viro (subscriber, #7872)
[Link] (1 responses)
There's a very strong smell of PHB all over the design. Worse, a PHB that had been told by some conslutant about Web 2.0 and social media being The Thing for millenial generation and decided to have a local equivalent of twitter built for communication with the plebes. It doesn't work well? Why, let's move it to the critical servers; those are on beefier intertubes, or something... Still doesn't work well? Too fucking bad for those who maintain those servers - it's their responsibility now (and of course, any questions regarding the basic design of the damn thing are countered with generous loads of "we had it behave that way before, therefore it must behave the same").
And yes, I am talking about dbus and plans of moving it kernel-side ;-/
Posted Nov 2, 2014 21:35 UTC (Sun)
by johannbg (guest, #65743)
[Link]
That nack of his was a bit weird if you ask me but I guess I need to sacrifice a chicken, dance on one foot and drink some of that kernel koolaid to get my mind into the kernel cult and communication.
Posted Oct 25, 2014 6:03 UTC (Sat)
by johannbg (guest, #65743)
[Link] (3 responses)
The underlying problem is the same basically parallelizing X where X can = d-bus, sockets, file system jobs, you name it.
"PID 1 is in extremely *bad* position for keeping track of everything and sending reliable notifications of everything. No mechanism would really help with that"
So let's hear it based on the function of PID-1 how would you do it if not PID 1?
What architectural design do you have in mind to solve this?
Posted Oct 25, 2014 6:38 UTC (Sat)
by viro (subscriber, #7872)
[Link] (2 responses)
And if you do insist on IPC, for some reason, you could bloody well start a caching daemon on demand (with e.g. timeout for inactivity). There's no reason whatsoever to keep that in PID 1 or anywhere near it.
We already have mechanisms for parallelizing. Had them for more than four decades. Called "processes"...
Having systemd forwarding a bunch of stuff it gleans from the kernel is asking for bottlenecks, for no good reason. System calls are not going away; not unless you want a truly monumental bottleneck with systemd playing the role of Mach server. Even then read(2) and open(2) wouldn't disappear, including those of /proc/self/mountinfo...
Posted Oct 25, 2014 8:40 UTC (Sat)
by johannbg (guest, #65743)
[Link] (1 responses)
We have to agree to disagree on the push vs pull implementation.
Posted Oct 25, 2014 19:23 UTC (Sat)
by dlang (guest, #313)
[Link]
Posted Oct 25, 2014 15:13 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Systemd uses a protected DBUS interface, so if you're able to DDoS it then you already have enough capabilities to wreck the system. It's not a fatal flaw in the design.
Also, it looks like KDBUS can help to throttle the senders - the regular DBUS daemon can't really do it cleanly.
Posted Nov 2, 2014 15:28 UTC (Sun)
by nix (subscriber, #2304)
[Link] (2 responses)
Or they could read /proc/self/mounts or /proc/self/mountinfo and get an always-reliable, namespace-aware view with none of this nonsense.
Posted Nov 2, 2014 17:09 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Besides, netlink sockets can block senders just as well as kdbus.
Posted Nov 2, 2014 17:31 UTC (Sun)
by viro (subscriber, #7872)
[Link]
Posted Oct 19, 2014 12:35 UTC (Sun)
by cortana (subscriber, #24596)
[Link] (1 responses)
I believe the problem with this approach is that if a non-pid-1 process with 'child subreaper' dies, all its children are reparented to pid 1, and there's no way for the restarted process to get back to the same state it was in before.
OTOH, you could take the approach that in that case, the real pid 1 kills everything before restarting the child, so it would be a bit like a crash followed by a fast reboot. That would be interesting to see.
Posted Oct 19, 2014 21:45 UTC (Sun)
by neilbrown (subscriber, #359)
[Link]
Not quite. The children are reparented to the next child-subreaper up the process tree, which will often be pid-1.
systemd could run normally as a parent and a child, both of which are 'child subreapers'. If the child dies, the parent re-starts it and feeds the new child any completion information for other processes that exit.
Posted Oct 19, 2014 15:11 UTC (Sun)
by misc (subscriber, #73730)
[Link] (2 responses)
And so far, I do use systemd since a few years now and it only caused me a fatal issue only once on a rawhide system. I had worst report of problem on btrfs. People keep repeating this as if there was lot of crash, but so far, the evidence do not indicate so.
Posted Oct 19, 2014 17:07 UTC (Sun)
by raven667 (subscriber, #5198)
[Link] (1 responses)
Posted Oct 25, 2014 2:13 UTC (Sat)
by zblaxell (subscriber, #26385)
[Link]
The same can be said of the Linux kernel, but I still greet each release with a full dose of well-deserved skepticism.
For what it's worth, I stick to discussions about the problems with systemd I've personally discovered. Other people have written at length about the ones I haven't.
I've used some of those millions of systems you're talking about. When they work they're OK, but in a crisis, a lot of surprising and disruptive behavior gets in the way of response, which is why I still choose to not run any outside of a test lab. Arguably the "surprising" part can be fixed with experience, but "disruptive" can only be fixed by patches (patches to default unit files that you have to install on every machine are still patches).
Posted Oct 19, 2014 14:45 UTC (Sun)
by flussence (guest, #85566)
[Link] (2 responses)
The same can be said of Btrfs — with a slight difference in the storage world though, nobody sane would run a system of two dozen interdependent moving parts with no redundancy or backups and expect it to stay working for long.
Posted Oct 20, 2014 0:31 UTC (Mon)
by pizza (subscriber, #46)
[Link] (1 responses)
> The same can be said of Btrfs
No, it can't.
Posted Oct 24, 2014 22:16 UTC (Fri)
by flussence (guest, #85566)
[Link]
I've been a bit more forgiving of other filesystems since then, since the de-facto standard has forced me to *really* lower my own standards.
Posted Oct 25, 2014 12:25 UTC (Sat)
by ms_43 (subscriber, #99293)
[Link] (1 responses)
In other words, this is purely a downstream distro packaging / policy issue. Why do you think that there is an onus on "systemd people" to "understand" or "fix their implementation" here?
Posted Oct 25, 2014 14:24 UTC (Sat)
by rahulsundaram (subscriber, #21946)
[Link]
https://fedoraproject.org/wiki/Fedora_14_talking_points#S...
Posted Oct 17, 2014 10:45 UTC (Fri)
by juliank (guest, #45896)
[Link] (4 responses)
Posted Oct 17, 2014 11:54 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (3 responses)
Posted Oct 18, 2014 0:17 UTC (Sat)
by ewan (guest, #5533)
[Link] (2 responses)
It's mostly a good system, but it does rather rely on the idea that everyone involved is acting with good faith and this is not that. It's a wholly destructive lashing out - not an attempt to improve anything, just to damage a project that's doing something Ian hates. The previous rounds of this argument were messy, but they were just part of the process; this is an abuse of process, and I don't see how Debian can tolerate this behaviour and still hope to retain the ability, credibility, and respect to be able to do what it's always done.
Posted Oct 18, 2014 1:04 UTC (Sat)
by anselm (subscriber, #2796)
[Link] (1 responses)
German political activist Rosa Luxemburg is credited with the quote, “Freedom is always the freedom of those who think differently“. Ian Jackson and his seconders are 100% free to propose a GR even if it seems ill-advised and basically flogging a horse that is practically fossilised by now. This shouldn't reflect badly on the project as a whole, as the project is simply upholding its ideals by allowing its members to introduce GRs, even ill-advised ones, in accordance with Debian's constitution as they see fit. If anything it reflects badly on Ian Jackson's good judgment for bringing the issue up right now, when he could have done it months ago or else waited after the Jessie release.
Posted Oct 20, 2014 6:43 UTC (Mon)
by tuomasjjrasanen (guest, #86050)
[Link]
This is probably one of the most valuable comment in this thread.
It's about freedom to think differently and freedom to disagree with others. And it works both ways. That is why it is Debian.
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
systemd-shim has caused breakage on my testing systems, so I think it doesn't work that well. I guess I could just embrace systemd, but that has caused breakage too. What am I missing?
The Debian init system general resolution returns
The Debian init system general resolution returns
It's free software: be happy to benefit from the work of others, but this means at times that things won't go the way you want. When this happen the proper attitude IMHO is to:
And I don't think anyone owe me anything just because I grace their software with my use (tongue firmly in cheek ;).
But that's not how Debian works, or at least how it's worked for me for the past few years. The first principle that I have observed is to not break running systems, even if they run testing. The second is to give choices, and the third to migrate smoothly between versions. And sadly, disruptions in testing are the herald of larger disruptions in upgrades.
The Debian init system general resolution returns
The Debian init system general resolution returns
There are some important differences between btrfs and systemd.
The Debian init system general resolution returns
The Debian init system general resolution returns
systemd doesn't eat data in and of itself, but it does get irretrievably stuck, kill innocent processes, and even panic the kernel when it fails.
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
It's not really worth bothering with any IPC, let alone the one of push instead of pull variety.
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
I suspect that parent could be kept simple enough that the chance of misbehaviour was as low as for a traditional init.
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
https://lwn.net/Articles/401856/
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
The Debian init system general resolution returns
I don't see how Debian can tolerate this behaviour and still hope to retain the ability, credibility, and respect to be able to do what it's always done.
The Debian init system general resolution returns