> Short version seems to be that on large multicore systems, ccNUMA doesn't scale, and specific message passing between cores and sockets seems to be a good idea, though I'm not quite clear how exactly the kernel running on each core passes messages around inside the core.
I think the idea is that the message passing code deals with the different interfaces much like we deal with network interfaces - they use different signalling mechanisms and throughput logic for each interface. If one kernel wants to talk to a core on the same chip, it uses a different (optimised) interface to that used for talking to the other chip on the same motherboard, or on the GPU.
I'm glad you showed me the barrelfish paper, because it gave me an insight. If we're using 'network' interfaces to talk to each kernel, then we can use actual network interfaces as well. Why not boot up another machine and have its kernels 'join' your machine's kernels across the network? Sure, you've got much larger latencies, no shared memory and different failure modes; but it's just another interface, these concepts are already well understood. You can also have processors joining and leaving different kernels on an as-needs basis.
It brings the realm of ubiquitous, scalable computing that bit closer.