User: Password:
|
|
Subscribe / Log in / New account

Still far from proprietary MPI implementations

Still far from proprietary MPI implementations

Posted Sep 16, 2010 12:03 UTC (Thu) by Np237 (subscriber, #69585)
Parent article: Fast interprocess messaging

Ideally the kernel should not copy the data at all, but provide a way to map memory pages belonging to one process in the other process, marking them copy-on-write.

With copy_to_process you have 1 copy instead of 2, but there should be 0 copy for MPI to behave better.


(Log in to post comments)

Still far from proprietary MPI implementations

Posted Sep 16, 2010 12:14 UTC (Thu) by Trelane (subscriber, #56877) [Link]

I wonder if there's some way to swap a page or a number of pages between processes.

Still far from proprietary MPI implementations

Posted Sep 16, 2010 12:50 UTC (Thu) by nix (subscriber, #2304) [Link]

Wasn't this sort of thing what the old skas patch for user-mode-linux used to do?

Still far from proprietary MPI implementations

Posted Sep 16, 2010 12:58 UTC (Thu) by ejr (subscriber, #51652) [Link]

There's a definition of zero copy floating around often attributed to Don Becker: Zero copy means someone *else* makes the copy.

That is more or less what happens in message passing using any shared memory mechanism. What you are describing is plain shared memory. It's perfectly fine to use within a single node, and I've done such a thing within MPI jobs working off large, read-only data sets to good success. (Transparent memory scaling of the data set when you're using multiple MPI processes on one node.) But it's not so useful for implementing MPI.

The interface here would help MPI when the receiver has already posted its receive when the send occurs. You then have the one necessary copy rather than two. Also, this interface has the *potential* of being smart with cache invalidation by avoiding caching the output on the sending processor! That is a serious cost; a shared buffer ends up bouncing between processors.

Still far from proprietary MPI implementations

Posted Sep 16, 2010 13:20 UTC (Thu) by Np237 (subscriber, #69585) [Link]

Indeed that makes the performance much less predictable. I wonder how well this behaves on real-life codes, though. At least Bull claims their MPI implementation does that, and the single-node performance is impressive.

Still far from proprietary MPI implementations

Posted Sep 17, 2010 5:38 UTC (Fri) by rusty (subscriber, #26) [Link]

> With copy_to_process you have 1 copy instead of 2, but there should be 0
> copy for MPI to behave better.

I'm not so sure. I believe that a future implementation could well remap pages, but playing with mappings is not as cheap as you might think, especially if you want the same semantics: the process the pages come from will need to mark them R/O so it can COW them if it tries to change them.

I'm pretty sure we'd need a TLB flush on every cpu either task has run on. Nonetheless, you know how much data is involved so if it turns out to be dozens of pages it might make sense. With huge pages or only KB of data, not so much.

And if you're transferring MB of data over MPI, you're already in a world of suck, right?

Cheers,
Rusty.

Still far from proprietary MPI implementations

Posted Jan 7, 2012 0:13 UTC (Sat) by wahern (subscriber, #37304) [Link]

I believe that was also Linus' analysis 6 years ago:

http://lists.freebsd.org/pipermail/freebsd-arch/2006-Apri...


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds