Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel

Posted Feb 11, 2013 20:54 UTC (Mon) by hp (guest, #5220)
In reply to: Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel by Cyberax
Parent article: Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel

You can find some hashing-out of the reconnect thing here:
http://lists.freedesktop.org/archives/dbus/2005-March/002...

I don't think the problem has anything to do with dbus-daemon; you can restart it fine.

The problem is that apps don't handle the restart... and that for them to handle it would require them to write quite a lot of complex code that would rarely be tested.

Say hypothetically that someone wrote all that code, and then religiously lobbied app developers to keep writing it in new apps, and kept testing it and fixing bugs...

Even given this hypothetical work, personally I would never trust that at a given point, I could trust all apps to have that codepath working. So I would just reboot anyway.

The difference between dbus and other daemons here is not that dbus somehow forbids restart. It's that dbus has persistent and stateful connections (and that's core and essential to the purpose of dbus).

Restarting dbus is like saying you want to restart the X server without killing any X apps. It's the same technical challenge as that. Namely, all apps would have to track and be able to restore all the state kept by the server.

Rule of thumb with dbus: what does X protocol do? dbus usually does the same thing and has the same pros and cons.

We have looked some at a client library design that makes it easier to handle daemon restart - basically a library where you provide a cascade of callbacks ("connect to bus handler", "service is now owned handler", etc.) and those callbacks could be re-run on bus re-connect. However, it is a lot harder for app developers to understand this kind of API, and in any case, existing apps aren't doing it this way.

Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel

Posted Feb 11, 2013 23:04 UTC (Mon) by daniel (guest, #3181) [Link] (1 responses)

So, if I may humbly misquote the content of your post: we are on this day, presented with a rare and wonderful opportunity to introduce a whole new class of kernel bugs, which so far have been mainly relegated to userspace and mainly manifested by the likes of Gnome and KDE apps. By no means should we miss this wonderful opportunity to grab this shiny new tool by the trigger and blow our collective legs off.

Of course, we could always consider waiting for DBus to stop causing user space issues before welcoming it into kernel with open arms, hearts and minds. I'm trying to avoid using the word "gaping" here...

Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel

Posted Feb 11, 2013 23:10 UTC (Mon) by hp (guest, #5220) [Link]

I have no idea what you're trying to say.

Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel

Posted Feb 12, 2013 0:04 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

That's the problem - the connection between the dbusd and clients has too much state. It could have been made stateless, or at least with the state confined to the client-side library.

It's not really possible with the current DBUS protocol. That's another reason why naively networked DBUS is not such a good idea. However, layering DBUS on top of something like ZeroMQ could be interesting.

Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel

Posted Feb 12, 2013 1:00 UTC (Tue) by hp (guest, #5220) [Link] (3 responses)

It's meaningless to say "too much state" in the abstract without understanding what the state is and why it exists.

dbus was widely adopted because it solved certain problems that were previously unsolved, by making different tradeoffs vs previous solutions.

Anybody can show up and say "oh that tradeoff has this downside." That's why it's called a tradeoff.

Anyone on the Internet could prove me wrong by showing the code which has the pros without the cons. That's the beauty. Everyone would jump to use a best of all worlds solution like that. Meanwhile, people are using a solution that exists.

In my view, the client libs could be designed to better support reconnection but ultimately the app has to handle the case. Neither the daemon nor the protocol are the source of the "restart problem." The problem is that handling restart in N different codebases, with none of them ever buggy, is not practical. It isn't impossible, but nobody who has actually written code, to date, has decided the cost:benefit ratio holds up and proceeded to tackle this.

Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel

Posted Feb 12, 2013 2:22 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Hey! I'm not saying that DBUS is somehow completely unacceptable and bad.

Yes, it works pretty well. But it does have shortcomings that could have been avoided by a more careful design. You can make a "reconnectable" messaging protocol pretty easy, it's not rocket surgery - by storing the current state of the server's subscriptions in the durable storage, for example. Or by introducing an explicit "reconnection" phase.

Slightly off topic (Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel)

Posted Feb 12, 2013 10:24 UTC (Tue) by ortalo (guest, #4654) [Link] (1 responses)

"rocket surgery": sounds like a nice new discipline to me, isn't it?
I am left wondering if it involves doing surgery in space or repairing rocket engines? (Hopefully not both...)

Slightly off topic (Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel)

Posted Feb 12, 2013 14:01 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Rocket surgery usually involves doing neurosurgery. With ROCKETS!