If you go look at some of the older LLVM papers they pretty much describe doing this... (I don't know if anyone implemented it, but given that they DO have a JIT compiler for LLVM IR already I think you could probably already do this in a limited form see http://llvm.org/cmds/lli.html the current llvm command that will run a LLVM IR bytecode object file with the LLVM JIT)
The papers talk about profiling and optimizing the IR and writing that back to the binary, so you get a binary optimized for your workload.
This still has the issues of library incompatibilities across architectures (even within the same distro) since the library may not have all the same options compiled in, or many export a slightly different set of symbols or all kinds of other things...