If you use OpenMP (which has a really nasty #pragma-based syntax but works
quite well, and which recent GCCs have native support for) then there is
no need to recompile at all: you just compile with OpenMP, and the program
itself probes for the number of cores at startup. I see no reason why an
enhanced OpenMP runtime couldn't probe for a GPGPU and run bits on there
if it wanted to, all without any need for this elaborate IR-shipping you
seem to be discussing.