It's not just linking that a tmpfs /tmp speeds up a bit, in theory: it's compilation, because without -pipe GCC writes its intermediate .S file to TMPDIR (and -pipe is not the default: obviously it speeds up compilation by allowing extra parallelism as well as reducing potential disk I/O, so I don't quite understand *why* it's still not the default, but there you are.)
btw, coreutils is by nobody's standards 'something having a lot of binaries'. It has relatively few very small binaries, few object files, and an enormous configure script that takes about 95% of the configure/make time (some of which, it is true, runs the compiler and writes to TMPDIR, but most of which is more shell-dependent than anything). LTO time will also have minimal impact in this build.
But, you're right, I'm pontificating in the absence of data -- or data less than eight years old, anyway, as the last time I measured this was in 2004. That's so out of date as to be useless. Time to measure again. But let's use some more hefty test cases than coreutils, less dominated by weird marginal workloads like configure runs.
Let's try a full build of something with more object files, and investigate elapsed time, cpu+sys time, and (for non-tmpfs) disk I/O time as measured from /proc/diskstats (thus, possibly thrown off by cross-fs merging: this is unavoidable, alas). A famous old test, the kernel (hacked to not use -pipe, with hot cache), shows minimal speedup, since the kernel does a multipass link process and writes the intermediates to non-$TMPDIR anyway:
So, a definite effect, but not a huge one. I note that the effect of -pipe is near-nil these days, likely because the extra parallelism you get from combining the compiler and assembler is just supplanting the extra parallelism you would otherwise get by running multiple copies of the compiler in parallel via make -j. (On a memory-constrained or disk-constrained system, where the useless /tmp writes may contend with useful disk reads, and where reads may be required as well, we would probably see a larger effect, but this system has 24Gb RAM and a caching RAID controller atop disks capable of 250Mb/s in streaming write, so it is effectively unconstrained, being quite capable of holding the whole source tree and all build products in RAM simultaneously. So this is intentionally a worst case for my thesis. Smaller systems will see a larger effect. Most systems these days are not I/O- or RAM-constrained when building a kernel, anyway.)
How about a real 900kg monster of a test, GCC? This one has everything, massive binaries, massive numbers of object files, big configure scripts writing to TMPDIR run in parallel with ongoing builds, immense link steps, you name it: if there is an effect this will show it. (4.6.x since that's what I have here right now: full x86_64/x86 multilibbed biarch nonprofiled -flto=jobserver -j 9 bootstrap including non-multilib libjava, minus testsuite run: hot cache forced by cp -a'ing the source tree before building; LTO is done in stage3 but in no prior stages so as to make the comparison with the next test a tiny bit more meaningful: stage2/3 comparison is suppressed for the same reason):
So, no significant effect elapsed-time-wise, well into the random noise: though the system time is noticeably higher for the non-tmpfs case, it is hugely dominated by the actual compilation. However, if you were doing anything else with the system you would have noticed: paging was intense, as you'd expect with around 10Gb of useless writes being flushed to disk. Any single physical disk would have been saturated, and a machine with much less memory would have been waiting on it.
That's probably the most meaningful pair of results here, a practical worst case for the CPU overhead of non-tmpfs use. Note that the LTO link stage alone writes around six gigabytes to TMPDIR, with peak usage at any one time around 4Gb, and most of this cannot be -pipe'd (thus this is actually an example of something that on many machines cannot be tmpfsed effectively).