No, you are missing my point. When I wrote all processes related to a given job, I really mean all processes, on all cluster nodes.
Yes, its complicated. But with the help of the MPI library you can close all connections (since all inter-nodes connections are supposed to go through MPI) in a synchronized way. This is what BLCR + OpenMPI already do.