LWN: Comments on "Relief for retpoline pain" https://lwn.net/Articles/774743/ This is a special feed containing comments posted to the individual LWN article titled "Relief for retpoline pain". en-us Tue, 09 Sep 2025 18:15:34 +0000 Tue, 09 Sep 2025 18:15:34 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Relief for retpoline pain https://lwn.net/Articles/776004/ https://lwn.net/Articles/776004/ teknoraver <div class="FormattedComment"> Awesome work!<br> </div> Fri, 04 Jan 2019 12:05:37 +0000 Relief for retpoline pain https://lwn.net/Articles/775385/ https://lwn.net/Articles/775385/ wtarreau <div class="FormattedComment"> The world is contiuously redoing the same things. I used to do this almost 10 years ago in haproxy ( <a href="http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=531cf0">http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=531cf0</a> ) and slightly more than a year ago, when explaining this code to someone, I said "I know it looks strange, this is old, dating when CPUs were not able to predict indirect branches, now we could get rid of this". Then spectre/meltdown arrived and I was very happy not to have touched that code :-)<br> </div> Fri, 21 Dec 2018 12:56:02 +0000 Relief for retpoline pain https://lwn.net/Articles/775296/ https://lwn.net/Articles/775296/ mp This comment seems to nicely illustrate the fact that "relpoline" is indeed a name <em><q>too close to "retpoline" for comfort</q></em>. Thu, 20 Dec 2018 13:25:48 +0000 Relief for retpoline pain https://lwn.net/Articles/775196/ https://lwn.net/Articles/775196/ roc <div class="FormattedComment"> In large applications the indirect branch predictor runs out of capacity so inline caches are still very useful.<br> </div> Tue, 18 Dec 2018 21:17:23 +0000 Relief for retpoline pain https://lwn.net/Articles/775185/ https://lwn.net/Articles/775185/ anton <blockquote>Indirect function calls [...] have never been blindingly fast</blockquote> Actually, in my measurements correctly predicted indirect calls have been as fast as direct calls on Intel-compatible CPUs for a decade or two. That obviated the need for inline caching, so it's not surprising that all the papers on inline caching are more than two decades old. Tue, 18 Dec 2018 18:27:38 +0000 Relief for retpoline pain https://lwn.net/Articles/775139/ https://lwn.net/Articles/775139/ jezuch <div class="FormattedComment"> In compilers this is called devirtualization and recent GCC versions can do this automatically for C++ at least. Java's JIT does this too and it's one of the biggest advantages of JIT over AOT as it knows for real what can and cannot be called and what is the distribution of probabilities of targets. A very smart compiler could in theory recognize the pattern in C and optimize it too, but since this is not a concept of the language itself, I wouldn't count on it really.<br> </div> Tue, 18 Dec 2018 11:59:20 +0000 Relief for retpoline pain https://lwn.net/Articles/775004/ https://lwn.net/Articles/775004/ jcm <div class="FormattedComment"> * The implemention turned out to be a nightmare, not the concept. It's ok to speculate into branches, you just need to tag the BTB with enough disambiguating context.<br> </div> Sat, 15 Dec 2018 19:32:48 +0000 Relief for retpoline pain https://lwn.net/Articles/775002/ https://lwn.net/Articles/775002/ jcm <div class="FormattedComment"> Retpolines don't prevent speculation, they just give the branch prediction logic a harmless path to speculate into. Speculation occurs into an infinite loop to self (with an optimization hint to the hw via a "pause" instruction so it doesn't actually consume cycles on the loop).<br> </div> Sat, 15 Dec 2018 19:26:16 +0000 Relief for retpoline pain https://lwn.net/Articles/774987/ https://lwn.net/Articles/774987/ ibukanov <div class="FormattedComment"> It was not only dynamic languages. Some compilers for object-oriented languages replace virtual calls by few ifs that check for all known classes and call the corresponding method statically. This was done, for example, in SmallEiffel compiler 20 years ago. <br> <p> The indirect branch prediction on CPU made that optimization largely unnecessary, but now we are back to it as the prediction turned out to be a security nightmare.<br> </div> Sat, 15 Dec 2018 09:55:59 +0000 Relief for retpoline pain https://lwn.net/Articles/774984/ https://lwn.net/Articles/774984/ ibukanov <div class="FormattedComment"> The branch predictor for indirect call is shared and unrelated processes can make it to speculate to jump to an arbitrary address. The conditional direct jumps as used by the if statements can only jump to the wrong branch of the if. The exploit is possible only when the code uses a particular not so frequent pattern and the defense when necessary does not cost as much as trampolines.<br> </div> Sat, 15 Dec 2018 09:35:19 +0000 Relief for retpoline pain https://lwn.net/Articles/774986/ https://lwn.net/Articles/774986/ pbonzini <div class="FormattedComment"> All these optimizations are suspiciously similar to the "inline caches" used to optimize method calls in dynamic languages!<br> </div> Sat, 15 Dec 2018 09:34:26 +0000 Relief for retpoline pain https://lwn.net/Articles/774983/ https://lwn.net/Articles/774983/ zev <div class="FormattedComment"> For a research project a few years ago I set up a prototype system somewhat similar to the "optpolines" described here -- it used perf to profile a running workload and discover common indirect call targets, and then took a whole syscall path and used LTO to compile a version of it with all indirect calls de-indirected and even inlined (with a guard check that fell back to the original code of course) to generate an optimized version of the hot code path for that specific running system (from syscall entry points all the way down to device drivers), which it then spliced into the running system as a livepatch.<br> <p> While I was working on it the results weren't quite dramatic enough to justify pursuing it further, but this was well before Spectre -- perhaps it just wasn't timed right...<br> <p> </div> Sat, 15 Dec 2018 08:39:36 +0000 Relief for retpoline pain https://lwn.net/Articles/774981/ https://lwn.net/Articles/774981/ areilly <div class="FormattedComment"> Sure, in fact you hope that it will: then the cost of those if() branches will be zero. It's a different piece of the branch predictor though, than the one that was using a poison-able/shared target cache. The if() way may be able to be biassed to speculate the wrong call, but it will still only call one of the functions you've compiled into your code, not an exploit. Also there is probably much less chance of causing the function pointer to be wildly wrong, compared to a wildly-wrong out-of-range array index. Ideally though you'd fix the hardware, so that the various "hidden" cache state was localized to protection domains along with the rest of memory....<br> </div> Sat, 15 Dec 2018 07:26:11 +0000 Relief for retpoline pain https://lwn.net/Articles/774978/ https://lwn.net/Articles/774978/ patrakov <div class="FormattedComment"> I don't fully understand how relpolines prevent speculation. Win't the CPU itself also learn the most common case and speculate along it? "OK, this if usually takes the true branch, and then there is a direct call right there, and then it loads this yummy stuff into memory, let's do that speculatively".<br> </div> Sat, 15 Dec 2018 06:01:01 +0000 Relief for retpoline pain https://lwn.net/Articles/774977/ https://lwn.net/Articles/774977/ josh <div class="FormattedComment"> There are other good reasons to optimize indirect calls into direct ones. If you can figure out what code can and can't be called by a function pointer, you could optimize out the code that can't be called, and even inline the only possible code in a given kernel configuration.<br> </div> Sat, 15 Dec 2018 05:52:17 +0000