How Debian managed the systemd transition
How Debian managed the systemd transition
Posted Sep 16, 2015 22:54 UTC (Wed) by luto (guest, #39314)In reply to: How Debian managed the systemd transition by josh
Parent article: How Debian managed the systemd transition
It has a big advantage over the current scheme: you can start it early and still don't need to worry about restarting it. It has no advantage over kdbus in terms of where the code is loaded from or ease of initialization, but I don't think it has a disadvantage either.
AIUI, the current scheme comes from the idea that all userspace code running after startup must reside on a non-initramfs mount. I've heard people say that it's not even possible to keep an initramfs program running after pivot_root. This is simply incorrect. Back when initramfs was actually ramfs, it wasted unpageable memory (just like kernel code), but initramfs is tmpfs nowadays.
Heck, there's no fundamental need for systemd to re-exec itself after pivot_root either, although, given that daemon-reexec is well-supported, it's probably a good idea from a forced testing and memory conservation perspective.
As a concrete, if dubious, benefit, udevd really could depend on dbus even without kdbus. Just require that dbus-daemon be started before udevd. (If this happened, I would drop udevd as part of the virtme minimal guest and I'd seriously consider busybox's udev as an alternative, but that a bit off-topic.)
Posted Sep 16, 2015 23:08 UTC (Wed)
by josh (subscriber, #17465)
[Link] (26 responses)
It does have at least two disadvantages there. First, getting dbus-daemon and all of its dependencies into the initramfs would prove rather annoying. Statically linking it isn't a solution (distros, dependency management, and static linking don't mix well), and adding a pile of libraries to the initramfs doesn't appeal. But even after doing that, which is certainly doable, "don't need to worry about restarting it" is a bug, not a feature; dbus-daemon is apparently utterly incapable of handling a restart, but it needs to restart on upgrade. kdbus doesn't have that problem, because it doesn't need a userspace daemon. (It needs some initial setup, but systemd does that, and systemd handles upgrades just fine.)
Posted Sep 16, 2015 23:24 UTC (Wed)
by luto (guest, #39314)
[Link] (25 responses)
I think your argument here is a bit confused. dbus-daemon is indeed apparently utterly incapable of handling a restart, so you can't upgrade it without rebooting (or blowing up everything that depends on it). But the kernel and, hence, kdbus, is utterly and completely incapable of being upgraded without rebooting, so the behavior is similar.
I would argue that the real issue with current distros is that they might actually try to upgrade dbus-daemon on disk *and restart it without rebooting*, which is doomed unless a *userspace* dbus daemon gets a major rewrite.
So I still don't see how kdbus is any better at all in this regard, aside from the fact that distros have already figured out how to build the kernel as a self-contained thing but might have trouble building a minimal static dbus-daemon. (It would work fine as a dynamic library with eager binding, too, but that's ugly.)
I'll grant that kdbus is probably a much more streamlined, self-contained piece of code than dbus-daemon, but that's more or less irrelevant wrt this issue.
Also, the userspace approach has a huge advantage here: you can run different versions of it in different containers.
Posted Sep 17, 2015 0:27 UTC (Thu)
by einstein (guest, #2052)
[Link] (3 responses)
Actually, the kernel can be upgraded without a reboot. I was using ksplice for that back in 2009 or so, and the feature is coming together in mainline.
Posted Sep 17, 2015 0:32 UTC (Thu)
by luto (guest, #39314)
[Link] (2 responses)
I've gotten emails from the ksplice team asking me how the heck they're supposed to handle a small number of individual entry changes I've made, and those are tiny compared to replacing the whole kernel.
* Within some reasonable limits.
Posted Sep 17, 2015 0:47 UTC (Thu)
by josh (subscriber, #17465)
[Link]
I'd argue that if you can successfully save userspace, kexec a new kernel, and seamlessly reload userspace, that's a huge accomplishment that counts as a "live" kernel upgrade.
Posted Sep 22, 2015 16:20 UTC (Tue)
by jejb (subscriber, #6654)
[Link]
Hey, that's not fair: to go from n to n+1 you know the only way is to save and restore the kernel state in a version independent manner, so you're trying to define the only possible method out of your challenge. The problem with the method is the time it takes, but there are people working on it
Posted Sep 17, 2015 1:07 UTC (Thu)
by josh (subscriber, #17465)
[Link] (20 responses)
(I'm going to ignore the case of unloading and reloading kdbus.ko, here, because I doubt you can do that without stopping all dbus users, so that doesn't count either. It does mean you could upgrade kdbus without upgrading the kernel, but that won't make sense once kdbus gets merged into the kernel. It also doesn't address your point.)
So, my contention is that if you ran dbus-daemon from the initramfs, then in addition to the pain of building a dbus-daemon that can run from the initramfs, while handling services and configuration files both from the initramfs *and* from the root filesystem, you'd also have cases where you need to reboot to upgrade dbus-daemon, because you want to upgrade the corresponding userspace and your userspace can't cope with an old dbus-daemon. (It *especially* can't cope with the dbus package getting upgraded on the filesystem but the running version being older than the installed package.)
Posted Sep 17, 2015 1:32 UTC (Thu)
by luto (guest, #39314)
[Link] (19 responses)
Sure, kdbus doesn't read config files, but there is no reason whatsoever that a userspace dbus daemon should need to read config files, especially if it's aiming for feature parity with kdbus. Similarly, kdbus claims ABI compatibility, but a userspace dbus daemon really ought to do the same.
I get kind of annoyed when kdbus gets compared to dbus-daemon-as-it-exists and the favorable comparisons are used as an argument for why kdbus is a good idea. Dbus-daemon has all kinds of problems, but, after reading far too many emails about it and thinking about it for far too long, I'm having trouble believing that there is a single respect in which kdbus solves a problem that a simple, streamlined userspace daemon can't easily solve.
If current dbus-daemon barfs when its package is upgraded under it, that's *pathetic*, but it's still not a good reason why distros should be excited about kdbus.
(The streamlined userspace daemon would need help from an improved AF_UNIX credential mechanism, but that's easy.)
Posted Sep 17, 2015 2:07 UTC (Thu)
by josh (subscriber, #17465)
[Link] (15 responses)
That hypothetical non-crufty daemon would almost never need upgrading, sure. And neither does kdbus, so the comparison works. But the dbus-daemon we have *today* doesn't belong in an initramfs, and that's where this discussion started. And I see a distinct lack of people working on a hypothetical non-crufty dbus-daemon, hence why it remains hypothetical.
Apart from that, I can think of several things kdbus can do that an arbitrarily lightweight dbus-daemon can't, which explains part of why nobody seems to want to work on a hypothetical non-crufty dbus-daemon. Most notably, it eliminates a context switch from every message passed (two from every round-trip). If you had a "non-crufty" dbus-daemon that didn't need to touch the actual messages, what remaining non-cruft purpose does the daemon serve? Even having dbus-daemon involved in setup or broadcasts represents unnecessary overhead.
Posted Sep 17, 2015 2:32 UTC (Thu)
by josh (subscriber, #17465)
[Link] (1 responses)
Posted Sep 17, 2015 5:55 UTC (Thu)
by luto (guest, #39314)
[Link]
Dbus is a nasty model for things like filesystems, though. Some kind of fast capability-based transport would be much better suited, especially since a file descriptor (or directory reference or whatever) maps quite nicely to a capability.
Posted Sep 17, 2015 5:52 UTC (Thu)
by luto (guest, #39314)
[Link] (12 responses)
But context switches should be decently under 2 µs on a modern system. (The atrocious performance of libgdbus + dbus-daemon has *nothing* do with with the extra context switch.) With some optimization, which certainly could be done, I bet we can significantly improve context switches performance.
In any event, for applications that care about throughput, the extra context switch is a red herring. Under load, a good central daemon will process many messages per time slice, so the throughput bottleneck is much more likely to be message routing and such rather than context switches. Under that type of load, having a central daemon shouldn't by much slower than doing everything in the kernel. Kdbus is IMO unlikely to be particularly fast in terms of CPU time used per message because the per-message processing is rather complex.
With a userspace mechanism built on top of a serious IPC primitive, the extra context switch goes away because the central daemon can easily introduce parties for direct communication. Linux has no such mechanism (other than SCM_RIGHTS). seL4 does, and I suspect (although I don't know for sure) that the other L4 systems do as well. Binder also looks reasonable for such uses, even though it's rather crufty in other respects.
For dbus in particular (userspace or kernel), I think that good performance under load will be tough, because dbus has a reliable in-order broadcast model. If everyone can broadcast to everyone in order, then the overall system needs to buffer each message until every receiver has read it. Since the senders and receivers are all asynchronous, that can be a lot of buffering. For kdbus in particular, the fancy "pool" model means (AFAICT) that all of the broadcast messages need to be buffered *separately* for each receiver. IMO this will work considerably worse than just doing it with a lightweight userspace daemon. Realistically, though, the fully-ordered broadcast model seems unlikely to hold up under load with *any* implementation whatsoever.
Posted Sep 23, 2015 9:33 UTC (Wed)
by paulj (subscriber, #341)
[Link] (10 responses)
That pressure will be hard to deflect by pointing out the correct solution to an inefficient user-space implementation is not a very $FAVOURED_IPC_OF_THIS_DECADE-specific kernel implementation, but instead to implement an efficient user-space implementation + whatever generalised kernel services are needed for IPC problems in the abstract. To deflect that pressure for good requires coming up with that efficient user-space implementation really.
Posted Sep 23, 2015 9:47 UTC (Wed)
by lgeorget (guest, #99972)
[Link] (2 responses)
Actually, if I recall correctly the discussions on that matter, the main advantage of the in-kernel implementation of dbus was not that it reduces the number of context switches but that it reduces the number of memory copies because for the kernel, unlike a user-space daemon, copying memory can be as simple as mapping the same pages in two processes.
> those people (like any others) aren't keen to have their work wasted, there will now be pressure to integrate it.
As far as I can tell from reading the mails on the Linux mailing list, Greg Kroah-Hartmann has shown to be very professional. He would surely be pleased to see his work in the mainline kernel, but not to the point to "pressure" anyone.
Posted Sep 23, 2015 15:06 UTC (Wed)
by luto (guest, #39314)
[Link]
For small messages, this barely matters, and for large messages, both kdbus and AF_UNIX users can use memfds, which does even less copying.
Actually, for small messages, I'll only believe that the kdbus approach is faster if someone benchmarks it cleanly. The saved copy is only possible because the kernel writes to the receiver's pool when the message is sent, and that means that the kernel has to map the receiver's pool, and that's not free. (In fact it can be very slow -- modern CPUs are very good at mapping things, but at least x86 makes *unmapping* extremely expensive.)
Posted Sep 23, 2015 15:51 UTC (Wed)
by dlang (guest, #313)
[Link]
So the 'official' justification for kdbus is no longer performance, but rather security and/or reliability
Posted Sep 23, 2015 16:29 UTC (Wed)
by raven667 (subscriber, #5198)
[Link] (6 responses)
I've been on the sidelines, following development on LWN, but that doesn't seem representative of the people involved or the effort which has gone into this, so I wouldn't presume that at all.
> not having questioned the notion that the performance problems with dbus-daemon were to do with kernel-userspace transitions
I believe there was awareness that the existing dbus-daemon implementation was not performant but also awareness that even a perfectly implemented userspace daemon has an upper limit on what it can do because of serializing, memory copying and context switches. Experience with the X Window protocol is instructive here as it sits in a very similar place in the software stack and there was a desire for dbus to be able to scale to the point of handling graphics data, which has already been demonstrated with X that a userspace daemon cannot do this without kernel support. Less copying and less context switches are also a boon for power usage which is becoming more important every year, both for battery powered and datacenter devices.
> efficient user-space implementation + whatever generalised kernel services are needed for IPC problems in the abstract.
This was the original goal and implementation many years ago but was flatly rejected by the kernel developers who would have needed to approve it which is why we have the kdbus implementation we have today as opposed to some other design. The original thought would be for a multicast AF_UNIX type socket that a userspace daemon could control which would be capable of zero-copy message delivery but the network subsystem maintainers refused to entertain the changes required to make something like that work and be supportable, so a different design which is much more self-contained is being proposed instead.
Posted Oct 9, 2015 23:29 UTC (Fri)
by nix (subscriber, #2304)
[Link] (5 responses)
You don't need to be the kernel to share memory... and with memfds, you don't even need to be the kernel to share memory with untrusted partners.
Posted Oct 10, 2015 1:24 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (4 responses)
Posted Oct 13, 2015 13:50 UTC (Tue)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Oct 13, 2015 14:45 UTC (Tue)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
I think one could argue that being given direct access to the graphics hardware, and thus effectively unlimited access to the entire system, should count as "kernel support". Sure, the driver code was inside the X server rather than compiled into the kernel or a loadable module, but it still required special interfaces used primarily by X, and it wasn't possible to run the X server as an ordinary, non-root user process.
Posted Oct 13, 2015 15:09 UTC (Tue)
by raven667 (subscriber, #5198)
[Link] (1 responses)
We've already gone down the route of adding dedicated IPC APIs for SysV, for Netlink, for X/Wayland and now for DBUS, which I see as following the evolution of OS design and the needs of the applications of the era when these interfaces were designed.
Posted Oct 13, 2015 22:49 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Sep 25, 2015 22:24 UTC (Fri)
by oak (guest, #2786)
[Link]
Result is that daemon message buffers grow until they take all your memory, your system message transport goes to swap (with everything else) and things become *really* slow until the problematic client is killed. If the client is woken up, daemon and client can spend many minutes (or hours depending on how much swap & buffering you have) during which bus isn't very responsive. If allocations were mixed well enough, emptying the message buffer on daemon doesn't actually free its dirtied memory because it's gotten fragmented.
This is D-BUS experience from 5-10 years ago on semi-embedded device. Even worse, the user-space daemon gets it's memory fragmented very easily and doesn't return to system memory it's once allocated. So, local DOS is trivial to do with any client that can connect to bus.
Some of the things where kernel *might* be able to improve on this are:
Posted Sep 17, 2015 6:11 UTC (Thu)
by alison (subscriber, #63752)
[Link] (2 responses)
Performance of Dbus-daemon aside, what about the more abstract question of whether a new message-passing API inside the kernel makes sense? From the shear design point of view, why does the kernel provide 3 notification services for userspace via fanotify, dnotify and inotify? Presumably the rationale for adding fanotify to dnotify and inotify was that fanotify was superior. Why does that rationale not apply to kdbus?
Both kdbus and Dbus-daemon will continue to evolve. The issue of whether the kernel should have a new feature would logically be decided on the basis of what the kernel's rightful role is. Mostly the kernel's job is to abstract away the details of hardware and to provide userspace with services (e.g. boot) that it would have difficulty managing itself. Is IPC like that provided by kdbus such a service, or no? If not, why is it fundamentally different from notification, to which it seems logically related?
Posted Sep 17, 2015 11:46 UTC (Thu)
by lsl (subscriber, #86508)
[Link] (1 responses)
Wasn't the "rationale" more like "we hope it makes snake oil vendors stop torturing our enterprise kernels with horrible out-of-tree modules"? At least that's what I remember from it. It wasn't any less drama than kdbus. Also, it didn't get merged until attempts were made to rework it to be more generally useful, for tasks other than implementing snake oil products.
Posted Sep 23, 2015 20:35 UTC (Wed)
by foom (subscriber, #14868)
[Link]
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
Experience with the X Window protocol is instructive here as it sits in a very similar place in the software stack and there was a desire for dbus to be able to scale to the point of handling graphics data, which has already been demonstrated with X that a userspace daemon cannot do this without kernel support.
X was doing just that without kernel support for nearly two decades. The MIT-SHM extension is worth noting.
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition
* Assigning message buffers memory cost to corresponding client, so that admin can identify who's the culprit
* Better allocator that guarantees that after processing the messages, the emptied buffer can actually be freed for other purposes (i.e. allocation blocks don't mix data with unrelated life-times, e.g. send and receive messages or messages from/to different clients)
* If message is status broadcast, maybe having some mechanism where only last status update is buffered
* Suspending message sending if receiver isn't processing the messages
How Debian managed the systemd transition
How Debian managed the systemd transition
How Debian managed the systemd transition