Missing the AF_BUS
D-Bus implements a mechanism by which processes can send messages to each other. Multicast functionality is inherently a part of the protocol; one message can be sent to multiple recipients. D-Bus promises reliable delivery, where "reliable" means that messages arrive in the order in which they were sent and multicast messages will either be delivered to all recipients or, if that is not possible, to none. There is a security model built into the protocol whereby messages can be limited to specific recipients. All of these features are used by contemporary systems, which expect the system to be robust, secure, and with as little latency and overhead as possible.
The current D-Bus implementation uses Unix-domain sockets and a central routing daemon. It works, but the routing daemon adds context switches, overhead, and latency to each message it handles. The kernel is unable to help get high-priority messages delivered first, so all messages cause wakeups that slow down the processing of the most important ones; see this message for a description of how these problems can affect a running system. It has been evident for some time to the developers involved that a better solution must be found.
There have been a number of attempts in that direction. The previous time this topic came up, it was around a set of patches adding multicast capabilities to Unix-domain sockets. This idea was rejected with the claim that the Unix-domain socket code is already too complicated and there was not enough justification to make things worse by adding multicast capabilities. The D-Bus developers were told to simply use IPv4 sockets, which already have multicast support, instead.
What those developers actually did was to implement AF_BUS, a new address family designed to meet
the needs of D-Bus. It provides the reliable delivery that D-Bus requires;
it also has the ability to pass file descriptors and credentials from one
process to another. The security mechanism is built in, with the netfilter
code (augmented with a new D-Bus message parser) used to control which
messages can actually be delivered to any
specific process. The end result, it is claimed, is a significant
reduction in D-Bus overhead due to reduced system calls; submitter Vincent
Sanders claims "a doubling in throughput and better than halving of
latency.
" See the associated
documentation for details on how this address family works.
A factor-of-two improvement in a component that is widely used in Linux
systems would certainly be welcome. The patch set, however, was not;
networking maintainer David Miller immediately stated his intention to simply ignore the
patch set entirely. His objections seem to be that IPv4 sockets are
sufficient for the task and that reliable delivery of multicast messages
cannot be done, even in the limited manner needed by D-Bus. He expressed
doubts that the IPv4 approach had even been tried, and decreed: "We are not creating a full
address family in the kernel which exists for one, and only one, specific
and difficult user.
"
Vincent responded that a number of approaches have been tried and found wanting. IPv4 sockets cannot provide the needed delivery guarantees and do not allow for the passing of file descriptors and credentials. It is also important, he said, for D-Bus to be up and running before the networking subsystem has been configured; setting up IP interfaces on a contemporary system often requires communication over D-Bus. There really is no better solution, he said.
He found support from a few other developers, including Alan Cox, who pointed out that there is no shortage of interprocess communication systems out there with requirements similar to D-Bus:
Everybody at the application level has been using these 'receiver reliable' multicast services for years (Websphere MQ, TIBCO, RTPGM, OpenPGM, MS-PGM, you name it). There are even accelerators for PGM based protocols in things like Cisco routers and Solarflare can do much of it on the card for 10Gbit.
He added that latency concerns are paramount on contemporary systems and
that one of the best ways of reducing latency is to cut back on context
switches and middleman processes. Chris Friesen added that his company uses "an
out-of-tree datagram multicast messaging protocol family based on
AF_UNIX
" that could almost certainly be replaced by something like
AF_BUS, were AF_BUS to be added to the mainline kernel.
There have been various other local messaging patch sets posted over the
years. So it seems clear that there is a significant level of interest in
having this sort of capability built into the Linux kernel. But interest
alone is not sufficient justification for the merging of a large patch set;
there must also be agreement from the developers who are charged with
ensuring that Linux has a top-quality networking stack in the long term.
That agreement is not yet there, so there may be a significant amount of
multicast interpersonal messaging required before we have multicast
interprocess messaging in the kernel.
Index entries for this article | |
---|---|
Kernel | D-Bus |
Kernel | Message passing |
Kernel | Networking/D-Bus |
Posted Jul 5, 2012 4:39 UTC (Thu)
by hp (guest, #5220)
[Link] (14 responses)
but in any case the opportunity is there. My belief is that nobody is chasing this because for the vast majority of people D-Bus performance is not an actual problem. When I've asked for concrete examples of when it was a problem, things like Nokia N900 (iPhone 3G era hardware right?) come up, and poorly coded applications aren't ruled out and seem likely to be involved even in that case.
basically there is just no need to performance tune the kind of stuff dbus is normally used for on a stock Linux desktop... if something is only 1% of user-visible speed, making it double fast isn't perceptible.
people do show up on the mailing list using dbus on low resource embedded systems and needing ultra low latency or something, but in those cases dbus was pretty clearly a poor choice of hammer for the nail at hand.
I don't think Alan is wrong though. the dbus semantics and guarantees that make it slow are also what make it convenient, and app developers generally want those guarantees and apps are less buggy if they have them. So it might be nice to make this genre of thing fast, even if the simple notifications etc. used by the desktop aren't performance critical, there are other domains that might benefit from ordered, reliable delivery, lifecycle tracking, etc. there's no question a faster implementation of dbus would be more broadly useful beyond just the desktop.
Posted Jul 5, 2012 5:32 UTC (Thu)
by hp (guest, #5220)
[Link]
So most problems and solutions that apply to X11 will also apply to dbus.
I think the tradeoffs and guarantees made here are a pretty good guide to what desktop/mobile app developers want when they're writing a UI that's implemented as a "swarm of processes" (as all the X desktops are). Framed another way, this is what a local IPC system has to provide in order to support relatively reliable application code in this context. However, these tradeoffs are probably inappropriate for systems distributed over the internet or even over a cluster.
Based on dbus list traffic there seem to be development situations where similar tradeoffs make sense but the inherent slowdown of the central dispatch daemon is a problem. That's where kernel-accelerated dbus-like-thing would make sense maybe.
Posted Jul 5, 2012 9:17 UTC (Thu)
by kyllikki (guest, #4370)
[Link]
We most definitely are committed to improving the userspace side of D-Bus in addition to the kernel work (which was a project for the GENIVI alliance)
Our eventual aim using all the solutions is for a tripling in throughput and a significant reduction of latency for the general case.
Posted Jul 5, 2012 10:33 UTC (Thu)
by smcv (subscriber, #53363)
[Link] (1 responses)
On the system bus, which is a trust boundary, poorly- or even maliciously-coded applications can never be ruled out, unfortunately.
> in those cases dbus was pretty clearly a poor choice of hammer for the nail at hand
People consider D-Bus to be a suitable transport for all sorts of things, desktop or not. The first sentence of the specification describes it as "a system for low-latency, low-overhead, easy to use interprocess communication", which probably contributes to the view that it's the right hammer for every nail - in practice, its current design tradeoffs tend to prioritize "easy to use" over low-latency.
Improving its latency, and avoiding priority inversion between the dbus-daemon and its clients, certainly increases the number of situations where D-Bus can be used. They might not be necessary for "the desktop bus", but that's by no means the only thing people use D-Bus for.
Improving the kernel-level transport is orthogonal to improving the user-space part (of which message (de)serialization is indeed likely to be the lowest-hanging fruit), and there's no reason they can't both happen.
> the dbus semantics and guarantees that make it slow are also what make it convenient
I absolutely agree that the convenient semantics - multicast signals, total ordering, conventions for lifecycle tracking and so on - are what make D-Bus valuable, and if you're willing to sacrifice those convenient semantics for performance, that's a sign that D-Bus is not right for you. Having said that, given the constraints of those semantics, the more efficient the better, and AF_BUS proves that there is room for improvement.
Posted Jul 5, 2012 13:01 UTC (Thu)
by hp (guest, #5220)
[Link]
What I meant here was, an app with lots of round trips in its protocol design or that shovels loads of data over the bus is going to be a perf problem. As a practical matter if you have user-visible situation xyz that appears slow, fixing dorky app behavior can be the fastest way to fix xyz.
> there's no reason they can't both happen
That's why I keep lobbying for the userspace changes to happen - a couple of them looked like they'd only take a few days of work. Hey, for all I know, someone did them already over the last few months. Anyway it's just bugging me (as you've no doubt gathered) that the kernel stuff is a kind of multi-year undertaking due to the difficult political issues, while the performance could be greatly improved without blocking on kernel devs...
So I'm just trying to give the potential userspace tasks some PR. Maybe someone reading these comments will want to work on them, we can dream...
(I know I'm telling those close to dbus things they already know. But it may not be apparent to those who aren't close to it that there's stuff they could do today.)
Posted Jul 5, 2012 17:15 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (7 responses)
Newfangled stuff like multi-touch. Or keyboard input (for Chinese and whatnot).
You don't want that stuff to go through more context switches (and processes) than strictly necessary. So AF_BUS seems to be a Good Thing.
Posted Jul 5, 2012 19:26 UTC (Thu)
by iabervon (subscriber, #722)
[Link] (6 responses)
Posted Jul 5, 2012 19:53 UTC (Thu)
by hp (guest, #5220)
[Link]
The downside is mostly that it's a fair bit more work for apps to do stuff like this. Services don't necessarily need to track "registered clients" right now but with this kind of setup they have to, in addition to dealing with the raw sockets and other extra work.
A lot of the discussion of speeding up dbus is motivated by trying to make the easy thing work well for apps, instead of requiring app authors to sort out these tradeoffs.
Especially with the higher-level API in say glib's dbus support, though, it might be possible to near-automatically put certain objects on dedicated sockets. Just a matter of programming...
Posted Jul 6, 2012 5:51 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
But the minute you start using socketpairs - it becomes impossible.
Posted Jul 6, 2012 6:24 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (3 responses)
* you lose the easy debugging and monitoring you now get with DBus (presumably, with AF_BUS you could use something like wireshark),
* the client now has to juggle multiple file descriptors, that requires an API change
* multiple file descriptors and reliable message ordering don't mix
Too many downsides if you ask me.
Posted Jul 6, 2012 6:58 UTC (Fri)
by Fowl (subscriber, #65667)
[Link] (1 responses)
Posted Jul 6, 2012 16:18 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
dbus_connection_get_unix_fd() returns exactly one file descriptor. If you open a direct connection to some application, you have more than one file descriptor. How do you make your application select()/poll() on both (or more) of these?
How would you propose to handle the monitoring problem? Let the daemon send a "somebody's listening for messages X, so if you exchange any of those privately, kindly send me a copy" commands to each and every client? Owch.
I'm not saying this cannot be done. I'm saying it's heaps more work, and more fragile, than simply moving the main chunk of this into the kernel, especially since there's already code which does that. And code is more authoritative than English.
Posted Jul 6, 2012 17:32 UTC (Fri)
by iabervon (subscriber, #722)
[Link]
Of course, it's certainly possible that people will want high-speed IPC with DBus properties also, and it makes sense for DBus to be efficient regardless of whether it's running into performance constraints. But it doesn't make sense to use DBus for all communication, even if its performance could be made good enough.
Posted Jul 26, 2012 22:41 UTC (Thu)
by oak (guest, #2786)
[Link] (1 responses)
One of the worst issues is D-BUS message delivery reliability. All it needs is an app that subscribes for some frequent message (like device orientation) and then doesn't read its messages either because it was supended or just buggy. As message delivery needs to be reliable, D-BUS will then just buffer the messages and get all the time slower and slower as it starts to swap.
Second issue is too complicated D-BUS setup. I think e.g. the N900 call handling goes through half a dozen daemons before the call UI pops up. Each of these steps adds it's own socket buffering and process scheduling overhead in addition to other overheads (e.g. paging the processes in to RAM if they were swapped out etc).
Then there's the D-BUS daemon code itself. Ever wondered why something that's "just" supposed to read and write data from sockets is CPU bound instead of IO bound? D-BUS daemon spends a lot of CPU on message content marshaling.
Posted Jul 26, 2012 22:57 UTC (Thu)
by hp (guest, #5220)
[Link]
the second issue is not dbus's fault. that kind of thing is often from making a daemon when a library would be better. it's a bug in the app design.
the third issue I've mentioned repeatedly myself including in the threads I linked before.
but none of these three thing are concrete examples of user visible operations. in most real world cases all three of these problems are gotten away with and it isn't perceptible. n900 is the most often mentioned case where they aren't and if you're correct here, N900 has at least one really bad setup with half a dozen daemons.
Posted Jul 5, 2012 4:55 UTC (Thu)
by alonz (subscriber, #815)
[Link] (17 responses)
After all, he was the one who practically shoved a new address family (AF_ALG) down the throats of the community as a “solution” to connecting kernel crypto with userspace—said solution being such a poor fit that it isn't being used anywhere, but stifling any opportunity to integrate an actually suitable solution.
(Yes, this is my personal hurtful spot, and I am grumpy.)
Posted Jul 5, 2012 5:47 UTC (Thu)
by daniel (guest, #3181)
[Link] (16 responses)
Posted Jul 5, 2012 6:00 UTC (Thu)
by alonz (subscriber, #815)
[Link] (5 responses)
Posted Jul 5, 2012 6:10 UTC (Thu)
by daniel (guest, #3181)
[Link] (4 responses)
Posted Jul 5, 2012 17:10 UTC (Thu)
by jond (subscriber, #37669)
[Link] (3 responses)
Posted Jul 7, 2012 1:40 UTC (Sat)
by daniel (guest, #3181)
[Link] (2 responses)
Posted Jul 13, 2012 5:43 UTC (Fri)
by Tov (subscriber, #61080)
[Link]
Posted Jul 15, 2012 7:55 UTC (Sun)
by philomath (guest, #84172)
[Link]
Posted Jul 5, 2012 18:26 UTC (Thu)
by josh (subscriber, #17465)
[Link] (9 responses)
Posted Jul 6, 2012 15:03 UTC (Fri)
by pspinler (subscriber, #2922)
[Link] (1 responses)
Certainly all that complexity can't be great for performance.
It's the argument I make for fibre channel v. iscsi. It's true that iscsi hardware (being just standard networking stuff) is a lot cheaper and does the job 90-95% of the time. But in the edge case, especially w.r.t latency, fibre still wins, largely because it's simple in comparison.
-- Pat
Posted Jul 9, 2012 2:35 UTC (Mon)
by raven667 (subscriber, #5198)
[Link]
That's something worth testing, scientifically.
> It's the argument I make for fibre channel v. iscsi. It's true that iscsi hardware (being just standard networking stuff) is a lot cheaper and does the job 90-95% of the time. But in the edge case, especially w.r.t latency, fibre still wins, largely because it's simple in comparison.
One thing about this example that I would like to point out. FC implements much of the features of Ethernet and TCP/IP ... differently, so in that sense the complexity is at least comparable though probably not equal. As far as the implementation complexity I think that FC can get off easier because as a practical matter it is used in closed networks often with all components from the same vendor. Ethernet and TCP/IP have to deal with a lot more varied equipment and varied networks and have to be battle tested against _anything_ happening, all that extra implementation complexity has a real reason for being there.
Posted Jul 9, 2012 6:02 UTC (Mon)
by daniel (guest, #3181)
[Link] (6 responses)
Here's a lovely bit:
http://lxr.linux.no/#linux+v3.4.4/net/ipv4/tcp_output.c#L796
This is part of a call chain that goes about 20 levels deep. There is much worse in there. See, that stuff looks plausible and if you listen to the folklore it sounds fast. But it actually isn't, which I know beyond a shadow of a doubt.
Posted Jul 9, 2012 6:53 UTC (Mon)
by daniel (guest, #3181)
[Link] (3 responses)
http://lxr.linux.no/#linux+v3.4.4/net/ipv4/ip_output.c#L799
This code just kills efficiency by a thousand cuts. There is no single culprit, it is just that all that twisting and turning, calling lots of little helpers and layering everything through an skb editing API that successfully confuses the optimizer adds up to an embarrassing amount of overhead. First rule to remember? Function calls are not free. Not at the speeds networks operate these days.
Posted Jul 9, 2012 8:18 UTC (Mon)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Jul 9, 2012 23:06 UTC (Mon)
by daglwn (guest, #65432)
[Link]
Posted Jul 9, 2012 18:40 UTC (Mon)
by butlerm (subscriber, #13312)
[Link]
Much of the complexity of that function has to do with kernel support for fragmented skbs, which is required for packets that are larger than the page size. That is the sort of thing that would go away if the kernel adopted a kernel page size larger than the hardware page size in cases where the latter is ridiculously small.
I am not sure what the real benefits are of managing everything in terms of 4K pages is on a system with modern memory sizes. Perhaps the idea of managing everything in terms of 64K pages (i.e. in groups of 16 hardware pages) could be revisited. That would dramatically simplify much of the networking code, because support for fragmented skbs could be dropped. No doubt it would have other benefits as well.
Posted Jul 9, 2012 9:11 UTC (Mon)
by gioele (subscriber, #61675)
[Link]
> This is part of a call chain that goes about 20 levels deep. There is much worse in there. See, that stuff looks plausible and if you listen to the folklore it sounds fast. But it actually isn't, which I know beyond a shadow of a doubt.
Don't you have some notes, implementation ideas or performance tests that you want to share with the rest of the kernel community? I'm pretty sure that they would love to hear how to cut in half the CPU overhead of UDP messages without regressions in functionalities.
This kind of impact would surely reduce the battery consumption of mobile applications, so, maybe the main developers will not interested, but devs of mobile-oriented forks like Android will surely be.
Posted Jul 9, 2012 20:26 UTC (Mon)
by butlerm (subscriber, #13312)
[Link]
Posted Jul 5, 2012 5:09 UTC (Thu)
by alonz (subscriber, #815)
[Link]
Posted Jul 5, 2012 10:50 UTC (Thu)
by jezuch (subscriber, #52988)
[Link] (2 responses)
I, too, was wondering how they (D-Bus) [expect to] achieve this...
Posted Jul 5, 2012 13:11 UTC (Thu)
by hp (guest, #5220)
[Link]
Posted Jul 5, 2012 23:52 UTC (Thu)
by Tester (guest, #40675)
[Link]
Posted Jul 6, 2012 4:45 UTC (Fri)
by mgalgs (guest, #85461)
[Link]
Posted Jul 6, 2012 12:42 UTC (Fri)
by cesarb (subscriber, #6266)
[Link]
Why not instead convert the dbus daemon into a kernel module, like has been done in the past with the http daemon? It would avoid having to context switch to and from the daemon, and need no changes to the networking subsystem.
Note: I am joking.
Posted Jul 10, 2012 9:59 UTC (Tue)
by Hugne (guest, #82663)
[Link]
It's there already, with reliable delivery.. modprobe tipc.
It does not pass SCM_RIGHTS or FD's, but a patchset that does this for node-local TIPC messaging will probably gain more acceptance than a new AF..
I asked on netdev if they had considered this, but i never saw a reply why they didn't choose it.
Posted Jul 10, 2012 10:55 UTC (Tue)
by nhippi (subscriber, #34640)
[Link]
Posted Jul 12, 2012 5:28 UTC (Thu)
by slashdot (guest, #22014)
[Link] (4 responses)
Make it so that the server creates a UNIX socket with the same it wants to take, and the client connects to it.
One of those could be an enumeration/activation/etc. server (but not a message router!).
For multicast, do the same and connect to all message broadcasters, using inotify to notice when new ones come up; the publisher just sends to all connected clients.
ZeroMQ can automate most of this, if desired.
The only kernel support that might be needed is support for having unlimited size Unix socket buffers and charging that memory to the receiver, so that the OOM killer/rlimit/etc. kills a non-responsive multicast receiver rather than the sender.
A more sophisticated solution that doesn't duplicate the socket buffer for each subscriber would be even better, but probably doesn't matter for normal usage cases.
Alternatively, get rid of signals, and instead have a key/value store abstraction where you can subscribe to value updates: this way, if the buffer space is full, you can just end an "overflow" packet and the client manually asks for the values of all its watched keys.
Posted Jul 12, 2012 5:57 UTC (Thu)
by michelp (guest, #44955)
[Link] (3 responses)
Posted Jul 12, 2012 6:31 UTC (Thu)
by michelp (guest, #44955)
[Link] (2 responses)
Posted Jul 12, 2012 6:57 UTC (Thu)
by neilbrown (subscriber, #359)
[Link] (1 responses)
It is true that we seem to add filesystems with gay abandon so maybe a similar case could be added for address families...
The reason that I would avoid adding multiple address families for IPC is that someone would want a mix of features from one and features from another (e.g. multicast and fd passing from AF_INET and AF_UNIX). So would we add yet another one that does both?
Posted Jul 12, 2012 15:02 UTC (Thu)
by michelp (guest, #44955)
[Link]
Can you give me an example of who is burdened by what exactly in this case?
> The reason that I would avoid adding multiple address families for IPC is
That seems like a speculative reason to reject existing and well established software patterns like d-bus, that are correctly leveraging a well established extension mechanism for adding new protocol families. Again, if it wasn't meant to be extended, then why have protocol families at all? Why was the 'sk is first member of the struct' pattern so well thought out from the beginning? It was done this way to provide ways for the mechanism to grow and evolve.
Missing the AF_BUS
http://lists.freedesktop.org/archives/dbus/2012-March/015...
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
> unfortunately
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Admittedly, on second thought, you could do it with epoll(). But it's still a change in semantics (you can't read from that file descriptor; worse, you can't write to it).
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
file, it can throw errors when the buffer size is whatever you like. there are also some list/bug discussions of other behaviors that could be useful to support.
I can't help but feel David Miller's response is a tad hypocritical :(
Missing the AF_BUS
Missing the AF_BUS
In Dave's defense will note that the Linux TCP stack does appear to be extremely efficient compared to other OS'es… It's just not always the perfect hammer for the screws you may be using.
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
I wonder whether this effort can somehow be made useful for a saner version of the Android binder code as well…
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Kernel module
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
Missing the AF_BUS
> that someone would want a mix of features from one and features from
> another (e.g. multicast and fd passing from AF_INET and AF_UNIX). So
> would we add yet another one that does both?