Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
Posted Mar 16, 2015 7:10 UTC (Mon) by JMB (guest, #74439)Parent article: Kernel prepatch 4.0-rc4
kdbus is queued up to be merged for 4.1-rc1 and entered linux-next.
That's quite fast compared to other projects -
maybe others can learn from that example.
Posted Mar 16, 2015 9:12 UTC (Mon)
by yoshi314 (guest, #36190)
[Link] (2 responses)
It's still a good reference for others in terms of submission quality (docs, test suite, etc)
Posted Mar 16, 2015 10:50 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
Also a good reference for others in terms of reaction to review - it went from being a char device to being a filesystem because review said a char device was the wrong thing to do. IOW, the problem solved hasn't changed, but the solution presented has changed drastically.
Posted Mar 16, 2015 20:15 UTC (Mon)
by flussence (guest, #85566)
[Link]
Posted Mar 16, 2015 13:00 UTC (Mon)
by zenor (guest, #100805)
[Link] (24 responses)
Posted Mar 16, 2015 13:08 UTC (Mon)
by karath (subscriber, #19025)
[Link] (22 responses)
On the other hand, totally agree that "entering linux-next != automatically merged next window"
Posted Mar 17, 2015 11:11 UTC (Tue)
by fishface60 (subscriber, #88700)
[Link]
If that is the cause of the reaction then it's not even grounded in fact.
Posted Mar 17, 2015 15:11 UTC (Tue)
by sorokin (guest, #88478)
[Link] (20 responses)
Everytime there is a discussion about kdbus, I'm asking people "why kdbus needs to be in kernel? What is the problem it is supposed to solve that can not be solved with UNIX domain socket?". I haven't got a convincing answer so far.
I worked on software that uses DCOM (Distributed COM, RPC mechanism by Microsoft) extensively. DCOM has a lot of problems both in term of performance and complexity. Then the software was partially rewritten to use sockets. After this rewrite the simplicity and performance was improved drastically. Since then I'm convinced that people who are inventing RPC solve non-existing problems.
This doesn't mean that kdbus as an implementation is bad in some way. I'm pretty sure it has a working, solid code. The problem is that probably this code is not needed at all. I think this is why some people call kdbus crap.
Posted Mar 17, 2015 16:03 UTC (Tue)
by JGR (subscriber, #93631)
[Link] (15 responses)
Just because sockets were a better fit for you particular project than DCOM, does not necessarily make DCOM or RPC protocols in general inherently bad.
You could argue that dbus is overused, or that plain Unix domain sockets with application-specific protocols would be adequate, but it'd still be useful to make existing dbus traffic more efficient.
Posted Mar 17, 2015 16:54 UTC (Tue)
by sorokin (guest, #88478)
[Link] (14 responses)
I would like to emphasize that it is not necessary create a new protocol from scratch. One could be able to use existing serialization library and existing sockets library. In this case the complexity of implementing a new protocol is minimal.
Posted Mar 17, 2015 17:36 UTC (Tue)
by raven667 (subscriber, #5198)
[Link] (12 responses)
Posted Mar 18, 2015 12:43 UTC (Wed)
by sorokin (guest, #88478)
[Link] (11 responses)
There were a few technical issues. Although I experienced them with DCOM, I think they are essential to any RPC mechanism with synchronous calls. If they are solved in D-Bus somehow, I will be glad to hear how. Here is a list with no particular order. (Sorry for my bad english in advance.)
1. The most important aspect of every RPC is how incoming calls are handled. I know two models for handling incoming calls: message loop (aka single-threaded apartment in COM) and thread pool (aka multi-threaded apartment in COM).
In first case all incoming calls are serialized with message loop and are handled one by one. This approach has an advantage that one doesn't have to worry about multithreading and it works very well for simple programs. Problems arise for compilated cases. Consider we have two processes A and B. A does a call foo() to B. Inside handling this call B calls bar() to A. And to handle bar() A calls baz() to B. To handle bar() and baz() both A and B should pump a nested message loop inside any outgoing calls. The problem with DCOM is that this nested message loop handles not only incoming calls from processes we are called to, but from _any_ process. Effectively it means that inside _any_ outgoing call _any_ incoming call can be handled. And the program should be prepared for that. It means that after doing outgoing call program state could be changed arbitrary. Needless to say this leads to crazy reentrancy bugs. Also it is difficult to track all places where outgoing calls are made. For example in my case logger was doing a outgoing calls (because one of appenders sends messages to other process), it means that inside LOG macro _any_ incoming function can be handled. As LOG macro was quite popular it means that _any_ function can be called from _any_ other.
* To make things worse not only incoming calls can be handled inside any outgoing call, but any window messages. It means that for example WM_TIMER (equivalent of timerfd) can be handled inside outgoing calls.
In case of thread pool things are worse. I tried to write a simple server in multi-threaded apartment and it was a pain. When A calls B and B calls A back, the original thread of A is not the same thread that handles a incoming call from B. It means that recursive mutex become non-recursive one. And there is lots of mutexes and locks in multi-threaded apartment server. Things are getting worse when there is three processes that calls each other.
Theoretically these two approaches could be merged into a better one. Probably some unique thread-id can be propagated through the calls. And if incoming call has the same thread-id as one of our outgoing call it is handled inside this outgoing call as in nested message loop, otherwise a new thread from thread pool is used. This will create an illusion that two processes share the same set of threads and it works well with mutexes. But malicious process could pass a fake thread-id and we will end up with all problems we have with two described approaches. So I don't think this is suitable solution for general purpose RPC mechanism.
2. The second problem is timeouts. When you are doing some outgoing call and there is some problem with a network, the call can timeout. The problem is that timeout value is huge. If I remember correctly in my case the timeout was 2 minute. Here is a quote (http://blog.matrikonopc.com/index.php/ask-the-experts-opc...):
This is absolutely insane. It means that it is not possible to use any DCOM calls in UI thread. In our case the typical number of DCOM client were 60. If we wait for each for 2 minutes it means our program will hang for 2 hour! Also I want to note that malicious process could hold a thread in your process indefinetely using minimal amount of resources (responding to pings and never sending reponse). In our case we were calling some 3rd party DCOM process that sometimes freezed our process.
3. The third problem was performance. At some point we've got into the problem that under some workload the throughput of logger was bottleneck. Our logger had a function, say, LogMessage(IMessage*) and ILogMessage was
struct ILogMessage
Consider process A calls LogMessage() to process B. Then process B call GetText() to process A. Then process B call GetLevel() to process A. Then process B call GetTime() to process A. Then process B returns control to process A. This interface is used both in-process and between processes. In-process it is reasonable and worked well. Between processes it is insanely ineffective as it does 8 pings between A and B instead of one. The correct way to pass log messages between processes is to collect them and pass them in bulks. But this is not how we pass data inside single process. The roundtrip time between processes is very expensive, but it is extremely cheap inside single process. I think the main problem with synchronous RPC is that it makes an illusion that we simple calls functions, while in reality we are not.
4. Custom marshalling. As time passed more and more compilated structures needed to be passed through DCOM. std::vector, boost::optional, std::pair or just some custom C++ structures. The problem with IDL and DCOM was that is it knows nothing about C++ and templates and standard library. So we had to write a lot of custom boilerplate code to convert our data structures to something that COM understand. Finally we ended up with something like DispatchMessage(void const*, size_t) and do all serialization manually.
5 (CORBA). I did some small server using CORBA. And CORBA has another problem. When client is killed it keeps reference incremented to your objects. It means that malicious client could easily leak your memory (http://www.davidchappell.com/writing/article_Refer_Counti...).
One could argue that problems 1-3 arise only if one uses synchronous calls and that async (in COM, https://msdn.microsoft.com/en-us/magazine/bb984898.aspx)/one way (in CORBA) calls don't have such problem. I think that if one uses only async calls he doesn't need COM at all because god socket and serialization libraries could serve it needs. If one uses synchronous calls he have problems 1-3. Also I argue that using sockets are much simpler than using async calls in DCOM! It worth noting that good serialization library can solve the problem 4.
Posted Mar 18, 2015 13:45 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
> 1. The most important aspect of every RPC is how incoming calls are handled. I know two models for handling incoming calls: message loop (aka single-threaded apartment in COM) and thread pool (aka multi-threaded apartment in COM).
That's a bad idea. Don't do that.
Besides, DBUS is actually an asynchronous message bus by design - synchronous calls are only emulated there.
Besides, about the only valid use-case for this scenario are notification callbacks. And they are handled perfectly well by the bus itself (which does multicast and buffering).
> 2. The second problem is timeouts. When you are doing some outgoing call and there is some problem with a network, the call can timeout. The problem is that timeout value is huge. If I remember correctly in my case the timeout was 2 minute. Here is a quote (http://blog.matrikonopc.com/index.php/ask-the-experts-opc...):
However, DBUS can reliably detect disconnection from the bus (in case of server crash, for example) and inform the calling process about that.
> 3. The third problem was performance. At some point we've got into the problem that under some workload the throughput of logger was bottleneck. Our logger had a function, say, LogMessage(IMessage*) and ILogMessage was
> 4. Custom marshalling. As time passed more and more compilated structures needed to be passed through DCOM.
You can even use memfd to seal and pass large arbitrary data. Kernel will guarantee that the data will not be tampered by the sender after the message submission.
> 5 (CORBA). I did some small server using CORBA. And CORBA has another problem. When client is killed it keeps reference incremented to your objects.
So in short, your experience with DCOM doesn't really apply to DBUS. They are _different_ and are used differently. Remember, the first draft of KDBUS implemented it as a new socket class.
Posted Mar 18, 2015 14:35 UTC (Wed)
by sorokin (guest, #88478)
[Link] (1 responses)
Surely one could replicate the broken behavior with sockets. My point was that synchronous calls are the problem. So we should stop pretending there is synchronous calls and doing everything asynchronously.
> Besides, DBUS is actually an asynchronous message bus by design - synchronous calls are only emulated there.
What does it mean "synchronous calls are only emulated"?
http://www.freedesktop.org/wiki/Software/systemd/hostnamed/ If I read property PrettyHostname, doesn't I block until property value is received? Also what does it mean being asynchronous by design? How can one be asynchronous not by design?
> Besides, about the only valid use-case for this scenario are notification callbacks. And they are handled perfectly well by the bus
While it is true that in simple system the most common use case for behavior is callbacks, as the number of processes in distributed system grows their interaction become more and more tricky.
> TCP idle timeout is 2 hours.
Again I argue that one should use asynchronous API. And this timeout is not a problem with asynchronous API.
> So add your own code for marshalling. DBUS doesn't care about your message content.
Domain sockets also doesn't care about message content. Domain socket are asynchronous. And kernel knows about dead clients and disconnects them correctly. Why should I prefer using D-Bus instead of domain sockets?
Also program based on domain sockets can relatively easily support regular tcp sockets.
Posted Mar 18, 2015 15:09 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
It's asynchronous, you're free to send other messages while GetProperties message is being processed and you can actually use it to pipeline the messages (send 100 GetProperty messages and then wait for the answers) to minimize the number of context switches.
Of course, most client libraries simply block while waiting for the answer unless you ask otherwise. But that's by no means the only possible behavior.
>Also what does it mean being asynchronous by design? How can one be asynchronous not by design?
> While it is true that in simple system the most common use case for behavior is callbacks, as the number of processes in distributed system grows their interaction become more and more tricky.
> Domain sockets also doesn't care about message content. Domain socket are asynchronous. And kernel knows about dead clients and disconnects them correctly. Why should I prefer using D-Bus instead of domain sockets?
Some of the reasons:
Posted Mar 18, 2015 13:49 UTC (Wed)
by MrWim (subscriber, #47432)
[Link] (4 responses)
Thanks for your clear, explicit and well reasoned response. I would recommend taking another look at DBus. It doesn't try to hide the asynchronous nature of IPC, which I understand is one of the underlying reasons you had issues with DCOM. Because DBus doesn't attempt to be transparent, and explicitly exposes asynchrony the single (unnested) message loop works fine, as you don't need to respond to a DBus call at the moment you call `return` in C On each call the client can specify a timeout, and because the asynchrony is explicitly exposed you can make a long-running call from your GUI thread without blocking the UI too. kdbus is an effort to improve this. DBus explictly doesn't define the shape of the language bindings, so you have the flexibility to produce whatever mapping between objects in your application and the DBus wire protocol. See Question 10 in the DBus FAQ. I have personally implemented sophisticated mappings between C++ objects and DBus including std::vector (to dbus array), boost::optional (to a dbus array of 0 or 1 elements), boost::tuple (to dbus struct) and even custom structs (to dbus structs or maps of string to variant). DBus provides messages which you can listen for to tell when your clients disconnect and do any necessary cleanup. Using this I managed to extend the semantics of boost::shared_ptr over DBus. See #38784
Posted Mar 18, 2015 17:52 UTC (Wed)
by sorokin (guest, #88478)
[Link] (3 responses)
If they provide some easy to use asynchronous API I think it has merit. Do you know some good C++ bindings for D-Bus (that is aware of standard library and typical C++ idioms)? A quick googling gives dbustl and boost-dbus. Probably there is something else.
Posted Mar 18, 2015 18:07 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Though dbus-boost looks really good (if you happen to use Boost.Asio).
Posted Mar 19, 2015 22:46 UTC (Thu)
by MrWim (subscriber, #47432)
[Link]
It's been a few years since I've looked at this. One of the problems with C++ is that everyone has a different idea of "standard idioms" and the subset of C++ that is reasonable and good taste. Back at the time I was using a heavily modified version of dbus-c++ with a template built in to generate code from the DBus XML that fitted the idioms of our project. In that case we had an in-house implementation of futures which made async about as convenient as possible with C++03. I think given our requirements at the time it would have been easier to write on top of libdbus itself, rather than starting with dbus-c++, but that's easy to say in hindsight. I've not looked at either of these before, as the work I was doing pre-dates them. Neither seems to include a code-generator, so there is a limit to how convenient they're going to be, depending on your use-case.
Posted Mar 18, 2015 14:31 UTC (Wed)
by JGR (subscriber, #93631)
[Link] (2 responses)
One could equally implement synchronous calls using standard socket and serialisation libraries, which would lead to problems 1-3 in much the same way. As I read it, your post is more about synchronous vs asynchronous RPC/messaging, than DCOM vs a custom RPC using sockets.
That said, problems 1-2 can be avoided whilst still using synchronous RPC if the thread making the outbound request does not have it's own event loop which is run during the request. One way to implement that would be to have the event loop dispatch (a subset of) events to a thread pool, though that is not fantastically efficient.
For something like a logger, a fire-and-forget message should be adequate, as there's no point blocking yourself waiting for an acknowledgement which doesn't really tell you anything.
> This is absolutely insane. It means that it is not possible to use any DCOM calls in UI thread.
Posted Mar 18, 2015 18:00 UTC (Wed)
by sorokin (guest, #88478)
[Link] (1 responses)
Absolutely.
> That said, problems 1-2 can be avoided whilst still using synchronous RPC if the thread making the outbound request does not have it's own event loop which is run during the request. One way to implement that would be to have the event loop dispatch (a subset of) events to a thread pool, though that is not fantastically efficient.
This is what we have implemented. DCOM supports asynchronous calls, but I don't remember why they we not used.
> For something like a logger, a fire-and-forget message should be adequate, as there's no point blocking yourself waiting for an acknowledgement which doesn't really tell you anything.
Agree. This is why I think CORBA-like oneway calls are sound.
> Any kind of synchronous non-local network IO in a UI thread is a non-starter. Even synchronous local network or disk IO is potentially problematic.
I was not an original author of the program. When I started working on it, it was already ridden with synchronous calls to other services. And this was sealed deep into program logic. Also sometimes it was difficult to track what call can be synchronous because the same interface could be implemented by local object or by remote one.
Posted Mar 18, 2015 18:11 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Mar 17, 2015 18:54 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
DCOM was just a _bad_ way to do RPC.
Posted Mar 17, 2015 20:14 UTC (Tue)
by dlang (guest, #313)
[Link]
As i understand it, the issue that UNIX sockets can't solve is broadcast/multicast. There were proposals to add such capabilities to sockets, but they were rejected.
Posted Mar 18, 2015 8:22 UTC (Wed)
by cortana (subscriber, #24596)
[Link]
Additionally, the message payload itself is included in the copy, which is silly as the D-Bus daemon doesn't care about the content of the messages, it only needs to know how to route it and whether the sender should be allowed to send.
kdbus moves the routing into the kernel and uses shared memory to eliminate the excess copies. For the example above, the 4 copies are reduced to a single copy in the case of a KDBUS_MSG_PAYLOAD_VEC message; in the case of the KDBUS_MSG_PAYLOAD_MEMFD then the message payload is 'sealed' so that the sending process can no longer modify it and then mapped directly into the address space of each destination process, enabling zero-copy transmission.
More information can be found in two excellent LWN articles: https://lwn.net/Articles/619068/ and https://lwn.net/Articles/580194/; and also in kdbus's documentation at https://d-bus.googlecode.com/git-history/policy/kdbus.txt.
Posted Mar 18, 2015 8:58 UTC (Wed)
by karath (subscriber, #19025)
[Link]
"Reasons why this should be done in the kernel, instead of userspace as
* Performance: Fewer process context switches, fewer copies, fewer
* Security: The peers which communicate do not have to trust each
* More types of metadata can be attached to messages than in userspace
* Semantics for apps with heavy data payloads (media apps, for
* Being in the kernel closes a lot of races which can't be fixed with
* Eavesdropping on the kernel level, so privileged users can hook into
* dbus-daemon is not available during early-boot or shutdown."
Posted Mar 18, 2015 10:13 UTC (Wed)
by jezuch (subscriber, #52988)
[Link]
People who are inventing higher-level languages solve non-existing problems.
People who are inventing application-layer protocols solve non-existing problems.
People who are inventing operating systems solve non-existing problems.
etc.
This is all about layers of abstraction. Your perception is tainted by your experience with DCOM. Try Java's RMI, for example, which has its warts but is pretty nice overall. Telling people that RMI solves a non-existing problem will only make you laughed at :)
Posted Mar 16, 2015 20:45 UTC (Mon)
by rvfh (guest, #31018)
[Link]
If you could at least prove to me that this work had been done with no effort to be of good quality I may come to agree with you, but it is simply not the case here, quite the opposite in fact.
So please keep this kind of prejudiced comment for yourself next time.
Posted Mar 16, 2015 21:14 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link]
The first version was total crap (with character device hierarchy and unsuitable for checkpointing).
The second version was a workable design but still flawed in many ways (interaction with namespaces, name registry): http://thread.gmane.org/gmane.linux.kernel.api/6266 and http://thread.gmane.org/gmane.linux.kernel/1822078
The third submission raised only few objections (mostly about potential security issues): http://article.gmane.org/gmane.linux.kernel.api/7536
In the fourth version only minor typos were found: http://thread.gmane.org/gmane.linux.kernel/1903942
So developers went and fixed the actual objections raised by subsystem maintainers. That's the way the process is supposed to work.
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
entering linux-next != automatically merged next window
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
The first pushes for this came from the automotive people, who wanted a faster D-Bus, and had their own variety of strange implementations.
One of which went into one of Greg K.H.'s long term support trees, which he wasn't happy about, so started planning an implementation that stood a chance of getting in.
It started out as a character device because the networking subsystem maintainers would never accept it as another socket type, since AF_BUS was pretty much a duplicate of AF_UNIX except with some bits tweaked.
It was only later that the systemd guys started wanting it so they could have a non-hacky early-boot D-Bus, since currently they manage by starting their own private D-Bus for internal IPC. They later need to register themselves with the system bus after the proper D-Bus daemon is started for the public D-Bus API, so it would be nice to remove the need for the private bus.
Kernel prepatch 4.0-rc4
> prepared to give specific reasons why it is crap? Or is this a kneejerk
> reaction because it arose from the needs of the systemd project?
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
Yes. This is exactly my point.
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
* One could argue that the problem is running a nested message loop inside outgoing call. But this is necessary. Consider we have two processes A and B. And then at some point of time A decided to call B, and B decided to call A almost simultaneously. If there is no nested message loop these two processes will deadlock.
* One could argue that the problem is that nested message loop is handling all incoming, while it should only handle incoming calls from the process outgoing call is made to. This is not the case. Consider we have three processes A, B and C. A calls B, B calls C, C calls A.
> "Distributed Component Object Model (DCOM) calls may take a long time to time out if the network is down, or if the server is unavailable. The actual DCOM timeout depends on a number of factors, including which network protocol DCOM is using and whether the server or network went down before or during the call. In some cases the timeout may be up to 6 minutes. DCOM makes use of RPC, and although RPC does allow some amount of control over call timeout, the programmer cannot normally take advantage of it because DCOM does not give access to the underlying RPC binding handle."
{
virtual HRESULT GetText(BSTR*);
virtual HRESULT GetLevel(LogLevel*);
virtual HRESULT GetTime(TimeStamp*);
...
};
Kernel prepatch 4.0-rc4
This applies just as well. You can have a process A establishing a connection to process B and during then process B trying to call process A during the loop dispatch. Deadlock.
TCP idle timeout is 2 hours.
DBUS is asynchronous and supports message buffering. It's possible to overflow the buffers, but there's nothing to be done about it.
So add your own code for marshalling. DBUS doesn't care about your message content.
KDBUS being a kernel-level functionality knows about dead clients and disconnects them correctly.
Kernel prepatch 4.0-rc4
> itself (which does multicast and buffering).
> KDBUS being a kernel-level functionality knows about dead clients and disconnects them correctly.
Kernel prepatch 4.0-rc4
Nope. You actually send a message "GetProperty" to "org.freedesktop.DBus.Properties" endpoint, to which it replies with the property value.
By emulating it through thread pools and blocking calls. Like DCOM, for example.
The first rule of distributed systems: "Don't". Most DBUS systems follow it closely.
* Built-in efficient multicast.
> * Security: The peers which communicate do not have to trust each other, as the only trustworthy component in the game is the kernel
> * More types of metadata can be attached to messages than in userspace
> * Semantics for apps with heavy data payloads (media apps, for instance) with optinal priority message dequeuing, and global message ordering.
> * Being in the kernel closes a lot of races which can't be fixed with the current userspace solutions
> * Eavesdropping on the kernel level, so privileged users can hook into the message stream without hacking support for that into their userspace processes
Kernel prepatch 4.0-rc4
Although I experienced them with DCOM, I think they are essential to any RPC mechanism with synchronous calls. If they are solved in D-Bus somehow, I will be glad to hear how.
1. ... how incoming calls are handled ...
2. ... timeouts ...
3. Performance ...
4. Custom marshalling ...
5. Client disconnects causing leaks
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
If they provide some easy to use asynchronous API I think it has merit. Do you know some good C++ bindings for D-Bus (that is aware of standard library and typical C++ idioms)?
A quick googling gives dbustl and boost-dbus. Probably there is something else.
Kernel prepatch 4.0-rc4
How is this insane? Any kind of synchronous non-local network IO in a UI thread is a non-starter. Even synchronous local network or disk IO is potentially problematic.
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
AFAIR, it was supported first in WinXP and were never really popular.
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
Kernel prepatch 4.0-rc4
it is currently done today include the following:
syscalls, larger memory chunks via memfd.
other, as the only trustworthy component in the game is the kernel
instance) with optinal priority message dequeuing, and global
message ordering.
the current userspace solutions
the message stream without hacking support for that into their
userspace processes
Kernel prepatch 4.0-rc4
Respecting others' work
Kernel prepatch 4.0-rc4