LWN: Comments on "Fast interprocess messaging" https://lwn.net/Articles/405346/ This is a special feed containing comments posted to the individual LWN article titled "Fast interprocess messaging". en-us Thu, 25 Sep 2025 22:56:26 +0000 Thu, 25 Sep 2025 22:56:26 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Still far from proprietary MPI implementations https://lwn.net/Articles/474607/ https://lwn.net/Articles/474607/ wahern <div class="FormattedComment"> I believe that was also Linus' analysis 6 years ago:<br> <p> <a href="http://lists.freebsd.org/pipermail/freebsd-arch/2006-April/005120.html">http://lists.freebsd.org/pipermail/freebsd-arch/2006-Apri...</a><br> <p> <p> </div> Sat, 07 Jan 2012 00:13:09 +0000 Fast interprocess messaging https://lwn.net/Articles/474593/ https://lwn.net/Articles/474593/ wahern <div class="FormattedComment"> Oops. I should've RTFA. They considered vmsplice() and thought it too much trouble to keep the processes' synchronized so messages don't end up buffered in the kernel. I don't see it. Either way signaling needs to occur between sender and receiver. I suppose all of this really cries out for a proper kernel AIO implementation (assuming there isn't one). Sender or receiver need a way to queue an op with an associated buffer.<br> <p> </div> Fri, 06 Jan 2012 20:03:42 +0000 Fast interprocess messaging https://lwn.net/Articles/474591/ https://lwn.net/Articles/474591/ wahern <div class="FormattedComment"> Like the reverse of vmsplice(2)? I think vmsplice() would suffice as-is. Even better, it only requires the sender to know about the interface; the receiver can keep using read().<br> <p> Actually, an optimized sockets implementation could accomplish single copy if both sender and receiver are in the kernel. The kernel could just copy from one buffer directly to the next. But perhaps the code for such an optimization would be to odious.<br> <p> <p> </div> Fri, 06 Jan 2012 19:49:41 +0000 Corosync anyone? https://lwn.net/Articles/406015/ https://lwn.net/Articles/406015/ njs <div class="FormattedComment"> <font class="QuotedText">&gt; a memory area mapped to a "file descriptor" hook on both sides</font><br> <p> I think this is the precise definition of the word "pipe"?<br> </div> Sat, 18 Sep 2010 06:19:26 +0000 Still far from proprietary MPI implementations https://lwn.net/Articles/405851/ https://lwn.net/Articles/405851/ rusty <div class="FormattedComment"> <font class="QuotedText">&gt; With copy_to_process you have 1 copy instead of 2, but there should be 0</font><br> <font class="QuotedText">&gt; copy for MPI to behave better.</font><br> <p> I'm not so sure. I believe that a future implementation could well remap pages, but playing with mappings is not as cheap as you might think, especially if you want the same semantics: the process the pages come from will need to mark them R/O so it can COW them if it tries to change them.<br> <p> I'm pretty sure we'd need a TLB flush on every cpu either task has run on. Nonetheless, you know how much data is involved so if it turns out to be dozens of pages it might make sense. With huge pages or only KB of data, not so much.<br> <p> And if you're transferring MB of data over MPI, you're already in a world of suck, right?<br> <p> Cheers,<br> Rusty.<br> </div> Fri, 17 Sep 2010 05:38:23 +0000 Fast IPC https://lwn.net/Articles/405838/ https://lwn.net/Articles/405838/ cma Yep ;)<br><br> The problem is that with this kind of IPC "shared-memory" based it would be possible to code a "self-cotained" app that would not depend on a typical shared-memory which Java native code is not possible to implement (i'm not talking of a JNI based solution). Semophores, locks and so on would not be needed here since with this "new IPC model" we would just stick with file/socket io programming making it possible to obtain really awesome inter-process communication latency and throught put using a unique programming semantics, like async-io on top of NIO, epoll, or even libevent/libev.<br><br> The trick is that the kernel should be doing all the complex stuff like cache aware, numa etc affinities exposing just what we need, a file descriptor ;) Regards Fri, 17 Sep 2010 03:09:43 +0000 Fast IPC https://lwn.net/Articles/405821/ https://lwn.net/Articles/405821/ vomlehn <div class="FormattedComment"> Ah, I get it. The idea is to copy into memory not visible from the other process. Never mind.<br> </div> Thu, 16 Sep 2010 23:56:18 +0000 Fast IPC https://lwn.net/Articles/405819/ https://lwn.net/Articles/405819/ vomlehn <div class="FormattedComment"> Not sure why you wouldn't just use shared memory, which ensures zero copies, and one of a number of synchronization primitives, depending on your particular needs. If not that, then a vmsplice()/splice() variant could be cooked up.<br> <p> At least at a quick glance, I don't see why any of the other ideas add to the mix.<br> </div> Thu, 16 Sep 2010 23:53:11 +0000 Corosync anyone? https://lwn.net/Articles/405791/ https://lwn.net/Articles/405791/ cma And why not implement something like corosync (http://www.kernel.org/doc/ols/2009/ols2009-pages-61-68.pdf) focusing on performance and scalability?<br><br> I mean, it would be great to have an very scalable Linux IPC with file I/O semantics. It would be very nice to abstract a "shared memory" like IPC using async-io back-ends with syscalls like epoll, or even using libevent or libev on top of.<br><br> I'm very interested in making a Java based app talk with very low latencies with a C/C++ app via NIO on Java's side and libevent/libev on C/C++ side. The point is that no TCP stack (or UNIX sockets) would be used, instead a memory area mapped to a "file descriptor" hook on both sides (Java and C/C++). Is that possible?<br><br> Any thoughts/ideas?<br> Thu, 16 Sep 2010 21:17:17 +0000 Fast interprocess messaging https://lwn.net/Articles/405688/ https://lwn.net/Articles/405688/ eduard.munteanu <div class="FormattedComment"> I'm not sure why the ownership restriction is needed. Ideally, such an interface would let a process tell the kernel "I'm allowing somebody else to send messages to me". That is, the copy would occur only if a copy_to_process() pairs up with a copy_from_process() and the buffers match. In effect, the processes would negotiate a communication channel, it doesn't really matter who owns them. Though yes, I can see that looking at the PID isn't enough to prevent issues, perhaps another authentication scheme is in order?<br> <p> Besides this, it's really good to see IPC performance improvements in the kernel.<br> <p> Any thoughts?<br> <p> </div> Thu, 16 Sep 2010 15:13:26 +0000 Still far from proprietary MPI implementations https://lwn.net/Articles/405646/ https://lwn.net/Articles/405646/ Np237 <div class="FormattedComment"> Indeed that makes the performance much less predictable. I wonder how well this behaves on real-life codes, though. At least Bull claims their MPI implementation does that, and the single-node performance is impressive.<br> </div> Thu, 16 Sep 2010 13:20:03 +0000 Still far from proprietary MPI implementations https://lwn.net/Articles/405635/ https://lwn.net/Articles/405635/ ejr <div class="FormattedComment"> There's a definition of zero copy floating around often attributed to Don Becker: Zero copy means someone *else* makes the copy.<br> <p> That is more or less what happens in message passing using any shared memory mechanism. What you are describing is plain shared memory. It's perfectly fine to use within a single node, and I've done such a thing within MPI jobs working off large, read-only data sets to good success. (Transparent memory scaling of the data set when you're using multiple MPI processes on one node.) But it's not so useful for implementing MPI.<br> <p> The interface here would help MPI when the receiver has already posted its receive when the send occurs. You then have the one necessary copy rather than two. Also, this interface has the *potential* of being smart with cache invalidation by avoiding caching the output on the sending processor! That is a serious cost; a shared buffer ends up bouncing between processors.<br> </div> Thu, 16 Sep 2010 12:58:10 +0000 Still far from proprietary MPI implementations https://lwn.net/Articles/405634/ https://lwn.net/Articles/405634/ nix <div class="FormattedComment"> Wasn't this sort of thing what the old skas patch for user-mode-linux used to do?<br> <p> </div> Thu, 16 Sep 2010 12:50:23 +0000 Still far from proprietary MPI implementations https://lwn.net/Articles/405625/ https://lwn.net/Articles/405625/ Trelane <div class="FormattedComment"> I wonder if there's some way to swap a page or a number of pages between processes.<br> </div> Thu, 16 Sep 2010 12:14:23 +0000 Still far from proprietary MPI implementations https://lwn.net/Articles/405620/ https://lwn.net/Articles/405620/ Np237 <div class="FormattedComment"> Ideally the kernel should not copy the data at all, but provide a way to map memory pages belonging to one process in the other process, marking them copy-on-write.<br> <p> With copy_to_process you have 1 copy instead of 2, but there should be 0 copy for MPI to behave better.<br> </div> Thu, 16 Sep 2010 12:03:55 +0000 Fast interprocess messaging https://lwn.net/Articles/405597/ https://lwn.net/Articles/405597/ intgr <p>Another problem with opening <tt>/proc/*/mem</tt> is that every process needs to keep a file handle open for every <i>other</i> process that it wants to communicate to. So if you have N processes communicating to each other, they will need N<sup>2</sup> file handles total. Now I'm not sure if this actually matters in the HPC world, they have tons of memory anyway... Just a thought.</p> <p>The alternative is opening the mem file for each message, send it and close it again. Maybe it works sufficiently well with the VFS scalability patches, but it still seems inefficient.</p> Thu, 16 Sep 2010 10:26:28 +0000 Fast interprocess messaging https://lwn.net/Articles/405578/ https://lwn.net/Articles/405578/ mjthayer <div class="FormattedComment"> That is nice for processes with the same owner of course, but a limited version for processes with different owners could be even nicer. For instance if it were possible for a process to open access to a section of its memory to, say, the process at the other end of a socket.<br> </div> Thu, 16 Sep 2010 08:35:22 +0000 Fast interprocess messaging https://lwn.net/Articles/405575/ https://lwn.net/Articles/405575/ mjthayer <div class="FormattedComment"> <font class="QuotedText">&gt; I'm wondering why you cannot achieve copy_*_process using pread and pwrite.</font><br> <font class="QuotedText">&gt; Open /proc/$PID/mem with O_DIRECT (maybe) and use pread / pwrite.</font><br> <p> I suppose that an advantage of copy_*_process would be that it will be more convenient to implement on other systems.<br> </div> Thu, 16 Sep 2010 08:30:16 +0000 Fast interprocess messaging https://lwn.net/Articles/405543/ https://lwn.net/Articles/405543/ nikanth <div class="FormattedComment"> Ah.. this was already discussed.<br> To use /proc/$pid/mem the process needs to be ptraced.<br> May be that restriction will be removed instead of new syscalls.<br> </div> Thu, 16 Sep 2010 06:47:25 +0000 Fast interprocess messaging https://lwn.net/Articles/405533/ https://lwn.net/Articles/405533/ nikanth <div class="FormattedComment"> I also don't see any inefficiency in using /proc/$pid/mem.<br> Waiting for your mail in that thread in LKML.<br> </div> Thu, 16 Sep 2010 06:13:17 +0000 Fast interprocess messaging https://lwn.net/Articles/405528/ https://lwn.net/Articles/405528/ neilbrown <div class="FormattedComment"> I'm wondering why you cannot achieve copy_*_process using pread and pwrite.<br> <p> Open /proc/$PID/mem with O_DIRECT (maybe) and use pread / pwrite.<br> Or maybe readv/writev.<br> <p> I don't see the need to invent a new syscall (unless maybe preadv/pwritev would be helpful).<br> </div> Thu, 16 Sep 2010 04:12:12 +0000