LWN.net Logo

Any performance data?

Any performance data?

Posted Jan 6, 2006 17:54 UTC (Fri) by sepreece (subscriber, #19270)
Parent article: Drawing the line on inline

I read through the discussion thread and didn't find any information about the performance impact of removing the inlining. While it's not unreasonable to think that cache hit rates would improve, it would be nice to see some data indicating what the bottom line performance change is.


(Log in to post comments)

Any performance data?

Posted Jan 7, 2006 14:40 UTC (Sat) by jospoortvliet (subscriber, #33164) [Link]

point is, it is very hard to measure. it can make a difference, but no
reliable real-world benchmarks exist, afaik. its very much a matter of
opinion, after all...

Any performance data?

Posted Jan 10, 2006 17:23 UTC (Tue) by sepreece (subscriber, #19270) [Link]

Benchmarking is hard and people use it to lie a lot. On the other hand, though, the reason people do inlining, usually, is to improve performance, so you'd think that somebody pushing reduced inlining would have done some kind of performance study to see if the changes hurt performance unacceptably.

In other words, I'm not concerned about whether the experiments are scalable or universally applicable, but it would be interesting to know whether it had had any observable effect on whatever ingo chose to observe.

I'd be especially interested in things like interrupt latency, which I would expect Ingo to be well-prepared to measure.

Any performance data?

Posted Jan 12, 2006 22:25 UTC (Thu) by anton (guest, #25547) [Link]

The effect of code duplicating optimizations (like inlining) on cache
miss rates can be counterintuitive. E.g., Mueller and Whaley [1]
tried an optimizationm that increased code size by 50%, and found that
it reduced the number of cache misses.

In the context of our work on replication in interpreters we have
tried to use partial replication to reduce the code size by a factor
of two or more (compared to full replication), but found that this
increased the cache misses significantly; partial inlining resulted in
worse spatial locality, and apparently this had more influence than
the code size. This research is not yet published.

Here is a scenario where de-inlining can increase cache misses:
Consider that a function is inlined in several places in the kernel,
but these different places are called so far apart that the function
is expelled from the cache between these executions. Then de-inlining
has no benefit for the cache hit rate; but it can have a cost:

- the direct cost is that the function probably does not utilize all
of the cache line where it starts and where it ends (whereas the
inlined version would share these lines with the caller which is also
executed). I.e., de-inlining reduces spatial locality.

- the indirect cost is that optimizations enabled by inlining are
suppressed, and these optimizations may reduce the code size and thus
the cache footprint.

[1] Frank Mueller and David B. Whalley. Avoiding Unconditional Jumps by
Code Replication. SIGPLAN '92 Conference on Programming Language
Design and Implementation, pp. 322-330, 1992.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds