Kernel prepatch 4.0-rc4

Posted Mar 17, 2015 16:03 UTC (Tue) by JGR (subscriber, #93631)
In reply to: Kernel prepatch 4.0-rc4 by sorokin
Parent article: Kernel prepatch 4.0-rc4

Classic dbus already uses Unix domain sockets, kdbus is still using sockets, the main difference is that the kernel acts as the dbus-daemon instead of having traffic tromboned though various userspace daemons, which is inefficient and makes things like namespacing awkward.

Just because sockets were a better fit for you particular project than DCOM, does not necessarily make DCOM or RPC protocols in general inherently bad.

You could argue that dbus is overused, or that plain Unix domain sockets with application-specific protocols would be adequate, but it'd still be useful to make existing dbus traffic more efficient.

Kernel prepatch 4.0-rc4

Posted Mar 17, 2015 16:54 UTC (Tue) by sorokin (guest, #88478) [Link] (14 responses)

> You could argue that dbus is overused, or that plain Unix domain sockets with application-specific protocols would be adequate
Yes. This is exactly my point.

I would like to emphasize that it is not necessary create a new protocol from scratch. One could be able to use existing serialization library and existing sockets library. In this case the complexity of implementing a new protocol is minimal.

Kernel prepatch 4.0-rc4

Posted Mar 17, 2015 17:36 UTC (Tue) by raven667 (subscriber, #5198) [Link] (12 responses)

If you use an existing library that then wouldn't that be the same as just using dbus, which has existing libraries for many languages and only takes a couple of lines to set up and use? Is the issue not wanting to use this particular brand of IPC, or is there some technical issue that I'm missing from your comment?

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 12:43 UTC (Wed) by sorokin (guest, #88478) [Link] (11 responses)

> Is the issue not wanting to use this particular brand of IPC, or is there some technical issue that I'm missing from your comment?

There were a few technical issues. Although I experienced them with DCOM, I think they are essential to any RPC mechanism with synchronous calls. If they are solved in D-Bus somehow, I will be glad to hear how. Here is a list with no particular order. (Sorry for my bad english in advance.)

1. The most important aspect of every RPC is how incoming calls are handled. I know two models for handling incoming calls: message loop (aka single-threaded apartment in COM) and thread pool (aka multi-threaded apartment in COM).

In first case all incoming calls are serialized with message loop and are handled one by one. This approach has an advantage that one doesn't have to worry about multithreading and it works very well for simple programs. Problems arise for compilated cases. Consider we have two processes A and B. A does a call foo() to B. Inside handling this call B calls bar() to A. And to handle bar() A calls baz() to B. To handle bar() and baz() both A and B should pump a nested message loop inside any outgoing calls. The problem with DCOM is that this nested message loop handles not only incoming calls from processes we are called to, but from _any_ process. Effectively it means that inside _any_ outgoing call _any_ incoming call can be handled. And the program should be prepared for that. It means that after doing outgoing call program state could be changed arbitrary. Needless to say this leads to crazy reentrancy bugs. Also it is difficult to track all places where outgoing calls are made. For example in my case logger was doing a outgoing calls (because one of appenders sends messages to other process), it means that inside LOG macro _any_ incoming function can be handled. As LOG macro was quite popular it means that _any_ function can be called from _any_ other.

* To make things worse not only incoming calls can be handled inside any outgoing call, but any window messages. It means that for example WM_TIMER (equivalent of timerfd) can be handled inside outgoing calls.
* One could argue that the problem is running a nested message loop inside outgoing call. But this is necessary. Consider we have two processes A and B. And then at some point of time A decided to call B, and B decided to call A almost simultaneously. If there is no nested message loop these two processes will deadlock.
* One could argue that the problem is that nested message loop is handling all incoming, while it should only handle incoming calls from the process outgoing call is made to. This is not the case. Consider we have three processes A, B and C. A calls B, B calls C, C calls A.

In case of thread pool things are worse. I tried to write a simple server in multi-threaded apartment and it was a pain. When A calls B and B calls A back, the original thread of A is not the same thread that handles a incoming call from B. It means that recursive mutex become non-recursive one. And there is lots of mutexes and locks in multi-threaded apartment server. Things are getting worse when there is three processes that calls each other.

Theoretically these two approaches could be merged into a better one. Probably some unique thread-id can be propagated through the calls. And if incoming call has the same thread-id as one of our outgoing call it is handled inside this outgoing call as in nested message loop, otherwise a new thread from thread pool is used. This will create an illusion that two processes share the same set of threads and it works well with mutexes. But malicious process could pass a fake thread-id and we will end up with all problems we have with two described approaches. So I don't think this is suitable solution for general purpose RPC mechanism.

2. The second problem is timeouts. When you are doing some outgoing call and there is some problem with a network, the call can timeout. The problem is that timeout value is huge. If I remember correctly in my case the timeout was 2 minute. Here is a quote (http://blog.matrikonopc.com/index.php/ask-the-experts-opc...):
> "Distributed Component Object Model (DCOM) calls may take a long time to time out if the network is down, or if the server is unavailable. The actual DCOM timeout depends on a number of factors, including which network protocol DCOM is using and whether the server or network went down before or during the call. In some cases the timeout may be up to 6 minutes. DCOM makes use of RPC, and although RPC does allow some amount of control over call timeout, the programmer cannot normally take advantage of it because DCOM does not give access to the underlying RPC binding handle."

This is absolutely insane. It means that it is not possible to use any DCOM calls in UI thread. In our case the typical number of DCOM client were 60. If we wait for each for 2 minutes it means our program will hang for 2 hour! Also I want to note that malicious process could hold a thread in your process indefinetely using minimal amount of resources (responding to pings and never sending reponse). In our case we were calling some 3rd party DCOM process that sometimes freezed our process.

3. The third problem was performance. At some point we've got into the problem that under some workload the throughput of logger was bottleneck. Our logger had a function, say, LogMessage(IMessage*) and ILogMessage was

struct ILogMessage
{
virtual HRESULT GetText(BSTR*);
virtual HRESULT GetLevel(LogLevel*);
virtual HRESULT GetTime(TimeStamp*);
...
};

Consider process A calls LogMessage() to process B. Then process B call GetText() to process A. Then process B call GetLevel() to process A. Then process B call GetTime() to process A. Then process B returns control to process A. This interface is used both in-process and between processes. In-process it is reasonable and worked well. Between processes it is insanely ineffective as it does 8 pings between A and B instead of one. The correct way to pass log messages between processes is to collect them and pass them in bulks. But this is not how we pass data inside single process. The roundtrip time between processes is very expensive, but it is extremely cheap inside single process. I think the main problem with synchronous RPC is that it makes an illusion that we simple calls functions, while in reality we are not.

4. Custom marshalling. As time passed more and more compilated structures needed to be passed through DCOM. std::vector, boost::optional, std::pair or just some custom C++ structures. The problem with IDL and DCOM was that is it knows nothing about C++ and templates and standard library. So we had to write a lot of custom boilerplate code to convert our data structures to something that COM understand. Finally we ended up with something like DispatchMessage(void const*, size_t) and do all serialization manually.

5 (CORBA). I did some small server using CORBA. And CORBA has another problem. When client is killed it keeps reference incremented to your objects. It means that malicious client could easily leak your memory (http://www.davidchappell.com/writing/article_Refer_Counti...).

One could argue that problems 1-3 arise only if one uses synchronous calls and that async (in COM, https://msdn.microsoft.com/en-us/magazine/bb984898.aspx)/one way (in CORBA) calls don't have such problem. I think that if one uses only async calls he doesn't need COM at all because god socket and serialization libraries could serve it needs. If one uses synchronous calls he have problems 1-3. Also I argue that using sockets are much simpler than using async calls in DCOM! It worth noting that good serialization library can solve the problem 4.

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 13:45 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

So let's imagine that you have a simple socket server. So let's see:

> 1. The most important aspect of every RPC is how incoming calls are handled. I know two models for handling incoming calls: message loop (aka single-threaded apartment in COM) and thread pool (aka multi-threaded apartment in COM).
This applies just as well. You can have a process A establishing a connection to process B and during then process B trying to call process A during the loop dispatch. Deadlock.

That's a bad idea. Don't do that.

Besides, DBUS is actually an asynchronous message bus by design - synchronous calls are only emulated there.

Besides, about the only valid use-case for this scenario are notification callbacks. And they are handled perfectly well by the bus itself (which does multicast and buffering).

> 2. The second problem is timeouts. When you are doing some outgoing call and there is some problem with a network, the call can timeout. The problem is that timeout value is huge. If I remember correctly in my case the timeout was 2 minute. Here is a quote (http://blog.matrikonopc.com/index.php/ask-the-experts-opc...):
TCP idle timeout is 2 hours.

However, DBUS can reliably detect disconnection from the bus (in case of server crash, for example) and inform the calling process about that.

> 3. The third problem was performance. At some point we've got into the problem that under some workload the throughput of logger was bottleneck. Our logger had a function, say, LogMessage(IMessage*) and ILogMessage was
DBUS is asynchronous and supports message buffering. It's possible to overflow the buffers, but there's nothing to be done about it.

> 4. Custom marshalling. As time passed more and more compilated structures needed to be passed through DCOM.
So add your own code for marshalling. DBUS doesn't care about your message content.

You can even use memfd to seal and pass large arbitrary data. Kernel will guarantee that the data will not be tampered by the sender after the message submission.

> 5 (CORBA). I did some small server using CORBA. And CORBA has another problem. When client is killed it keeps reference incremented to your objects.
KDBUS being a kernel-level functionality knows about dead clients and disconnects them correctly.

So in short, your experience with DCOM doesn't really apply to DBUS. They are _different_ and are used differently. Remember, the first draft of KDBUS implemented it as a new socket class.

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 14:35 UTC (Wed) by sorokin (guest, #88478) [Link] (1 responses)

> This applies just as well. You can have a process A establishing a connection to process B and during then process B trying to call process A during the loop dispatch. Deadlock.

Surely one could replicate the broken behavior with sockets. My point was that synchronous calls are the problem. So we should stop pretending there is synchronous calls and doing everything asynchronously.

> Besides, DBUS is actually an asynchronous message bus by design - synchronous calls are only emulated there.

What does it mean "synchronous calls are only emulated"?

http://www.freedesktop.org/wiki/Software/systemd/hostnamed/ If I read property PrettyHostname, doesn't I block until property value is received? Also what does it mean being asynchronous by design? How can one be asynchronous not by design?

> Besides, about the only valid use-case for this scenario are notification callbacks. And they are handled perfectly well by the bus
> itself (which does multicast and buffering).

While it is true that in simple system the most common use case for behavior is callbacks, as the number of processes in distributed system grows their interaction become more and more tricky.

> TCP idle timeout is 2 hours.

Again I argue that one should use asynchronous API. And this timeout is not a problem with asynchronous API.

> So add your own code for marshalling. DBUS doesn't care about your message content.
> KDBUS being a kernel-level functionality knows about dead clients and disconnects them correctly.

Domain sockets also doesn't care about message content. Domain socket are asynchronous. And kernel knows about dead clients and disconnects them correctly. Why should I prefer using D-Bus instead of domain sockets?

Also program based on domain sockets can relatively easily support regular tcp sockets.

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 15:09 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> What does it mean "synchronous calls are only emulated"? http://www.freedesktop.org/wiki/Software/systemd/hostnamed/ If I read property PrettyHostname, doesn't I block until property value is received?
Nope. You actually send a message "GetProperty" to "org.freedesktop.DBus.Properties" endpoint, to which it replies with the property value.

It's asynchronous, you're free to send other messages while GetProperties message is being processed and you can actually use it to pipeline the messages (send 100 GetProperty messages and then wait for the answers) to minimize the number of context switches.

Of course, most client libraries simply block while waiting for the answer unless you ask otherwise. But that's by no means the only possible behavior.

>Also what does it mean being asynchronous by design? How can one be asynchronous not by design?
By emulating it through thread pools and blocking calls. Like DCOM, for example.

> While it is true that in simple system the most common use case for behavior is callbacks, as the number of processes in distributed system grows their interaction become more and more tricky.
The first rule of distributed systems: "Don't". Most DBUS systems follow it closely.

> Domain sockets also doesn't care about message content. Domain socket are asynchronous. And kernel knows about dead clients and disconnects them correctly. Why should I prefer using D-Bus instead of domain sockets?

Some of the reasons:
* Built-in efficient multicast.
> * Security: The peers which communicate do not have to trust each other, as the only trustworthy component in the game is the kernel
> * More types of metadata can be attached to messages than in userspace
> * Semantics for apps with heavy data payloads (media apps, for instance) with optinal priority message dequeuing, and global message ordering.
> * Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions
> * Eavesdropping on the kernel level, so privileged users can hook into the message stream without hacking support for that into their userspace processes

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 13:49 UTC (Wed) by MrWim (subscriber, #47432) [Link] (4 responses)

Thanks for your clear, explicit and well reasoned response.

Although I experienced them with DCOM, I think they are essential to any RPC mechanism with synchronous calls. If they are solved in D-Bus somehow, I will be glad to hear how.

I would recommend taking another look at DBus. It doesn't try to hide the asynchronous nature of IPC, which I understand is one of the underlying reasons you had issues with DCOM.

1. ... how incoming calls are handled ...

Because DBus doesn't attempt to be transparent, and explicitly exposes asynchrony the single (unnested) message loop works fine, as you don't need to respond to a DBus call at the moment you call `return` in C

2. ... timeouts ...

On each call the client can specify a timeout, and because the asynchrony is explicitly exposed you can make a long-running call from your GUI thread without blocking the UI too.

3. Performance ...

kdbus is an effort to improve this.

4. Custom marshalling ...

DBus explictly doesn't define the shape of the language bindings, so you have the flexibility to produce whatever mapping between objects in your application and the DBus wire protocol. See Question 10 in the DBus FAQ.

I have personally implemented sophisticated mappings between C++ objects and DBus including std::vector (to dbus array), boost::optional (to a dbus array of 0 or 1 elements), boost::tuple (to dbus struct) and even custom structs (to dbus structs or maps of string to variant).

5. Client disconnects causing leaks

DBus provides messages which you can listen for to tell when your clients disconnect and do any necessary cleanup. Using this I managed to extend the semantics of boost::shared_ptr over DBus. See #38784

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 17:52 UTC (Wed) by sorokin (guest, #88478) [Link] (3 responses)

> I would recommend taking another look at DBus. It doesn't try to hide the asynchronous nature of IPC, which I understand is one of the underlying reasons you had issues with DCOM.

If they provide some easy to use asynchronous API I think it has merit. Do you know some good C++ bindings for D-Bus (that is aware of standard library and typical C++ idioms)? A quick googling gives dbustl and boost-dbus. Probably there is something else.

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 18:07 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Just use the original libdbus ( http://dbus.freedesktop.org/ ). There are several wrappers but they are simply not worth it.

Though dbus-boost looks really good (if you happen to use Boost.Asio).

Kernel prepatch 4.0-rc4

Posted Mar 19, 2015 22:28 UTC (Thu) by mbiebl (subscriber, #41876) [Link]

There's also
http://doc.qt.io/qt-5/qtdbus-index.html

Kernel prepatch 4.0-rc4

Posted Mar 19, 2015 22:46 UTC (Thu) by MrWim (subscriber, #47432) [Link]

If they provide some easy to use asynchronous API I think it has merit. Do you know some good C++ bindings for D-Bus (that is aware of standard library and typical C++ idioms)?

It's been a few years since I've looked at this. One of the problems with C++ is that everyone has a different idea of "standard idioms" and the subset of C++ that is reasonable and good taste.

Back at the time I was using a heavily modified version of dbus-c++ with a template built in to generate code from the DBus XML that fitted the idioms of our project. In that case we had an in-house implementation of futures which made async about as convenient as possible with C++03. I think given our requirements at the time it would have been easier to write on top of libdbus itself, rather than starting with dbus-c++, but that's easy to say in hindsight.

A quick googling gives dbustl and boost-dbus. Probably there is something else.

I've not looked at either of these before, as the work I was doing pre-dates them. Neither seems to include a code-generator, so there is a limit to how convenient they're going to be, depending on your use-case.

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 14:31 UTC (Wed) by JGR (subscriber, #93631) [Link] (2 responses)

> One could argue that problems 1-3 arise only if one uses synchronous calls and that async (in COM, https://msdn.microsoft.com/en-us/magazine/bb984898.aspx)/one way (in CORBA) calls don't have such problem. I think that if one uses only async calls he doesn't need COM at all because god socket and serialization libraries could serve it needs. If one uses synchronous calls he have problems 1-3. Also I argue that using sockets are much simpler than using async calls in DCOM! It worth noting that good serialization library can solve the problem 4.

One could equally implement synchronous calls using standard socket and serialisation libraries, which would lead to problems 1-3 in much the same way. As I read it, your post is more about synchronous vs asynchronous RPC/messaging, than DCOM vs a custom RPC using sockets.

That said, problems 1-2 can be avoided whilst still using synchronous RPC if the thread making the outbound request does not have it's own event loop which is run during the request. One way to implement that would be to have the event loop dispatch (a subset of) events to a thread pool, though that is not fantastically efficient.

For something like a logger, a fire-and-forget message should be adequate, as there's no point blocking yourself waiting for an acknowledgement which doesn't really tell you anything.

> This is absolutely insane. It means that it is not possible to use any DCOM calls in UI thread.
How is this insane? Any kind of synchronous non-local network IO in a UI thread is a non-starter. Even synchronous local network or disk IO is potentially problematic.

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 18:00 UTC (Wed) by sorokin (guest, #88478) [Link] (1 responses)

> One could equally implement synchronous calls using standard socket and serialisation libraries, which would lead to problems 1-3 in much the same way. As I read it, your post is more about synchronous vs asynchronous RPC/messaging, than DCOM vs a custom RPC using sockets.

Absolutely.

> That said, problems 1-2 can be avoided whilst still using synchronous RPC if the thread making the outbound request does not have it's own event loop which is run during the request. One way to implement that would be to have the event loop dispatch (a subset of) events to a thread pool, though that is not fantastically efficient.

This is what we have implemented. DCOM supports asynchronous calls, but I don't remember why they we not used.

> For something like a logger, a fire-and-forget message should be adequate, as there's no point blocking yourself waiting for an acknowledgement which doesn't really tell you anything.

Agree. This is why I think CORBA-like oneway calls are sound.

> Any kind of synchronous non-local network IO in a UI thread is a non-starter. Even synchronous local network or disk IO is potentially problematic.

I was not an original author of the program. When I started working on it, it was already ridden with synchronous calls to other services. And this was sealed deep into program logic. Also sometimes it was difficult to track what call can be synchronous because the same interface could be implemented by local object or by remote one.

Kernel prepatch 4.0-rc4

Posted Mar 18, 2015 18:11 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> This is what we have implemented. DCOM supports asynchronous calls, but I don't remember why they we not used.
AFAIR, it was supported first in WinXP and were never really popular.

Kernel prepatch 4.0-rc4

Posted Mar 17, 2015 18:54 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

So what's the advantage of using a custom (and inevitably hacky) protocol instead of DBUS which is available for most of the languages and is well-supported?

DCOM was just a _bad_ way to do RPC.