AME sounds like BSP (Bulk Synchronous Parallel), which divides programs into independent supersteps separated by global barrier and no communication except at the end of a superstep. There is an efficient free C library that implements BSP and does not require any compiler support whatsoever. It supports linux, windows, etc and TCP, MPI, shared memory, etc.
The determininistic nature of BSP avoids turning 5 line programs into 10 pages of monograph with the simplifying assumption that x=x+1 is atomic.
An implementation of BSP can actually do the communication earlier than the end of a superstep provided this does not affect anything until the next superstep. Many problems require a very small number of supersteps, so the impact of the global barriers is not severe.
I find it hard to buy the idea that AME can't be implemented in CPython.
Posted Aug 10, 2012 12:41 UTC (Fri) by njs (guest, #40338)
[Link]
BSP requires programmers to be explicit about communication. The goal of the STM stuff he's talking about is to preserve the shared-everything threaded memory model for inter-thread communication, and automatically detect and serialize these "communication events" (i.e., arbitrary memory accesses).
I have no idea why this is the goal, because shared-everything is horrible for programmers and implementors alike. I guess that makes it a fun engineering problem. It's like if Oulipo designed runtime environments.
Rigo: Multicore Programming in PyPy and CPython
Posted Aug 10, 2012 16:13 UTC (Fri) by drag (subscriber, #31333)
[Link]
> I have no idea why this is the goal, because shared-everything is horrible for programmers and implementors alike. I guess that makes it a fun engineering problem. It's like if Oulipo designed runtime environments.
Isn't the major advantage of threading versus forking in Linux the ability to have low-overhead IPC over things like shared memory and the such?
Otherwise what is the advantage over forking?
Just curious.
Rigo: Multicore Programming in PyPy and CPython
Posted Aug 10, 2012 18:08 UTC (Fri) by njs (guest, #40338)
[Link]
If we're using terms like BSP, STM, shared-everything, etc., then we're talking about what semantics your runtime provides, not how it's implemented in terms of OS primitives.
Right now Python on Linux has two easy options for parallelism. You can fork(), which gives you shared-nothing parallelism (which is great, very easy to reason about) with somewhat cumbersome and expensive IPC (serialized objects copied explicitly over sockets or shared memory, explicit chunk-of-shared-bytes via mmap, that sort of thing). Or you can use "threads", and get shared-everything parallelism (which is horrible) with fast IPC, except the GIL kind of kills that.
But you can imagine PyPy implementing fork()-like shared-nothing semantics *with* fast IPC. It'd use pthreads underneath, but with each thread having a totally different Python namespace, and when you passed an object to another thread it would just mark it copy-on-write instead of actually making a copy. This is how languages designed for parallelism, like Erlang or Rust, are intended to work.
Implementing this on top of CPython would be harder, and you'd have to have a whole transition period while people audited their existing C modules to adapt them to handle GIL-less operation and the new copy-on-write stuff, but it'd be possible. And seems more likely to me than designing a new C-compatible language like Armin is suggesting. I doubt it's worth it for a relatively minor IPC speedup, though...
Rigo: Multicore Programming in PyPy and CPython
Posted Aug 16, 2012 14:55 UTC (Thu) by martin.langhoff (subscriber, #61417)
[Link]
Yes! Having an Erlang-ish map() in Python, that transparently dispatches the work to a number of processes/threads would be fantastic.