AME sounds like BSP (Bulk Synchronous Parallel), which divides programs into independent supersteps separated by global barrier and no communication except at the end of a superstep. There is an efficient free C library that implements BSP and does not require any compiler support whatsoever. It supports linux, windows, etc and TCP, MPI, shared memory, etc.
The determininistic nature of BSP avoids turning 5 line programs into 10 pages of monograph with the simplifying assumption that x=x+1 is atomic.
An implementation of BSP can actually do the communication earlier than the end of a superstep provided this does not affect anything until the next superstep. Many problems require a very small number of supersteps, so the impact of the global barriers is not severe.
I find it hard to buy the idea that AME can't be implemented in CPython.