The effect of code duplicating optimizations (like inlining) on cache
miss rates can be counterintuitive. E.g., Mueller and Whaley 
tried an optimizationm that increased code size by 50%, and found that
it reduced the number of cache misses.
In the context of our work on replication in interpreters we have
tried to use partial replication to reduce the code size by a factor
of two or more (compared to full replication), but found that this
increased the cache misses significantly; partial inlining resulted in
worse spatial locality, and apparently this had more influence than
the code size. This research is not yet published.
Here is a scenario where de-inlining can increase cache misses:
Consider that a function is inlined in several places in the kernel,
but these different places are called so far apart that the function
is expelled from the cache between these executions. Then de-inlining
has no benefit for the cache hit rate; but it can have a cost:
- the direct cost is that the function probably does not utilize all
of the cache line where it starts and where it ends (whereas the
inlined version would share these lines with the caller which is also
executed). I.e., de-inlining reduces spatial locality.
- the indirect cost is that optimizations enabled by inlining are
suppressed, and these optimizations may reduce the code size and thus
the cache footprint.
 Frank Mueller and David B. Whalley. Avoiding Unconditional Jumps by
Code Replication. SIGPLAN '92 Conference on Programming Language
Design and Implementation, pp. 322-330, 1992.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds