>Reliable multicast is fundamentally something that doesn't belong in the kernel. The kernel handles discrete operations, and leaves user land to manage long-term state with unbounded time and space requirements.
Embarrassingly-parallel algorithms get a big boost out of shared data structures when memory access can be mitigated by L3-tier caches. The wider the compute capabilities of the processor - and the future is wide, if not clockspeed-fast - the better.