LWN: Comments on "Relief for retpoline pain"

Relief for retpoline pain

teknoraver — Fri, 04 Jan 2019 12:05:37 +0000

Awesome work!

Relief for retpoline pain

wtarreau — Fri, 21 Dec 2018 12:56:02 +0000

The world is contiuously redoing the same things. I used to do this almost 10 years ago in haproxy ( http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=531cf0 ) and slightly more than a year ago, when explaining this code to someone, I said "I know it looks strange, this is old, dating when CPUs were not able to predict indirect branches, now we could get rid of this". Then spectre/meltdown arrived and I was very happy not to have touched that code :-)

Relief for retpoline pain

mp — Thu, 20 Dec 2018 13:25:48 +0000

This comment seems to nicely illustrate the fact that "relpoline" is indeed a name too close to "retpoline" for comfort.

Relief for retpoline pain

roc — Tue, 18 Dec 2018 21:17:23 +0000

In large applications the indirect branch predictor runs out of capacity so inline caches are still very useful.

Relief for retpoline pain

anton — Tue, 18 Dec 2018 18:27:38 +0000

Indirect function calls [...] have never been blindingly fast

Actually, in my measurements correctly predicted indirect calls have been as fast as direct calls on Intel-compatible CPUs for a decade or two. That obviated the need for inline caching, so it's not surprising that all the papers on inline caching are more than two decades old.

Relief for retpoline pain

jezuch — Tue, 18 Dec 2018 11:59:20 +0000

In compilers this is called devirtualization and recent GCC versions can do this automatically for C++ at least. Java's JIT does this too and it's one of the biggest advantages of JIT over AOT as it knows for real what can and cannot be called and what is the distribution of probabilities of targets. A very smart compiler could in theory recognize the pattern in C and optimize it too, but since this is not a concept of the language itself, I wouldn't count on it really.

Relief for retpoline pain

jcm — Sat, 15 Dec 2018 19:32:48 +0000

* The implemention turned out to be a nightmare, not the concept. It's ok to speculate into branches, you just need to tag the BTB with enough disambiguating context.

Relief for retpoline pain

jcm — Sat, 15 Dec 2018 19:26:16 +0000

Retpolines don't prevent speculation, they just give the branch prediction logic a harmless path to speculate into. Speculation occurs into an infinite loop to self (with an optimization hint to the hw via a "pause" instruction so it doesn't actually consume cycles on the loop).

Relief for retpoline pain

ibukanov — Sat, 15 Dec 2018 09:55:59 +0000

It was not only dynamic languages. Some compilers for object-oriented languages replace virtual calls by few ifs that check for all known classes and call the corresponding method statically. This was done, for example, in SmallEiffel compiler 20 years ago.

The indirect branch prediction on CPU made that optimization largely unnecessary, but now we are back to it as the prediction turned out to be a security nightmare.

Relief for retpoline pain

ibukanov — Sat, 15 Dec 2018 09:35:19 +0000

The branch predictor for indirect call is shared and unrelated processes can make it to speculate to jump to an arbitrary address. The conditional direct jumps as used by the if statements can only jump to the wrong branch of the if. The exploit is possible only when the code uses a particular not so frequent pattern and the defense when necessary does not cost as much as trampolines.

Relief for retpoline pain

pbonzini — Sat, 15 Dec 2018 09:34:26 +0000

All these optimizations are suspiciously similar to the "inline caches" used to optimize method calls in dynamic languages!

Relief for retpoline pain

zev — Sat, 15 Dec 2018 08:39:36 +0000

For a research project a few years ago I set up a prototype system somewhat similar to the "optpolines" described here -- it used perf to profile a running workload and discover common indirect call targets, and then took a whole syscall path and used LTO to compile a version of it with all indirect calls de-indirected and even inlined (with a guard check that fell back to the original code of course) to generate an optimized version of the hot code path for that specific running system (from syscall entry points all the way down to device drivers), which it then spliced into the running system as a livepatch.

While I was working on it the results weren't quite dramatic enough to justify pursuing it further, but this was well before Spectre -- perhaps it just wasn't timed right...

Relief for retpoline pain

areilly — Sat, 15 Dec 2018 07:26:11 +0000

Sure, in fact you hope that it will: then the cost of those if() branches will be zero. It's a different piece of the branch predictor though, than the one that was using a poison-able/shared target cache. The if() way may be able to be biassed to speculate the wrong call, but it will still only call one of the functions you've compiled into your code, not an exploit. Also there is probably much less chance of causing the function pointer to be wildly wrong, compared to a wildly-wrong out-of-range array index. Ideally though you'd fix the hardware, so that the various "hidden" cache state was localized to protection domains along with the rest of memory....

Relief for retpoline pain

patrakov — Sat, 15 Dec 2018 06:01:01 +0000

I don't fully understand how relpolines prevent speculation. Win't the CPU itself also learn the most common case and speculate along it? "OK, this if usually takes the true branch, and then there is a direct call right there, and then it loads this yummy stuff into memory, let's do that speculatively".

Relief for retpoline pain

josh — Sat, 15 Dec 2018 05:52:17 +0000

There are other good reasons to optimize indirect calls into direct ones. If you can figure out what code can and can't be called by a function pointer, you could optimize out the code that can't be called, and even inline the only possible code in a given kernel configuration.