LWN: Comments on "The cost of inline functions"

The cost of inline functions

hs — Sun, 09 May 2004 07:54:59 +0000

not necessarily. it depends a lot on the function code and what the optimizer does: in some situations things like constant propagation and commeon subexpression detection can make most of the inlined code go away. <p>
with some optimizations it takes good judgement to decide whether to activate them or not

Time/space tradeoff

joib — Thu, 06 May 2004 06:41:19 +0000

gcc 3.4 has some optimizations in this area. From http://gcc.gnu.org/gcc-3.4/changes.html:

# A new unit-at-a-time compilation scheme for C, Objective-C, C++ and Java which is enabled via -funit-at-a-time (and implied by -O2). In this scheme a whole file is parsed first and optimized later. The following basic inter-procedural optimizations are implemented:

* Removal of unreachable functions and variables
* Discovery of local functions (functions with static linkage whose address is never taken)
* On i386, these local functions use register parameter passing conventions.
* Reordering of functions in topological order of the call graph to enable better propagation of optimizing hints (such as the stack alignments needed by functions) in the back end.
* Call graph based out-of-order inlining heuristics which allows to limit overall compilation unit growth (--param inline-unit-growth).

Overall, the unit-at-a-time scheme produces a 1.3% improvement for the SPECint2000 benchmark on the i386 architecture (AMD Athlon CPU).
# More realistic code size estimates used by inlining for C, Objective-C, C++ and Java. The growth of large functions can now be limited via --param large-function-insns and --param large-function-growth.

The cost of inline functions

eru — Mon, 03 May 2004 08:20:49 +0000

Doesn't this also depend a lot on the processor architecture? I used to work with a SPARC and there I found that in the GCC of the time, turning on automatic inlining was usually a pessimization for my applications. I assumed that this is because non-inlined calls to simple functions on the SPARC chips are cheap. Most arguments are passed in registers, and the register window mechanism greatly reduces the instructions needed to save/restore caller's register-allocated variables.

I would argue that "inline" as a language feature is just like the "register" storage class. It should not be used unless inlining really is necessary for low-level reasons, and normally the compiler should be left to make inlining decisions based on its knowledge of the target processor trade-offs.

The cost of inline functions

oak — Fri, 30 Apr 2004 14:14:22 +0000

Inline function can make code smaller if the inlined code is smaller (e.g. single struct lookup) than the instructions required for setting up a function call.

Time/space tradeoff

jzbiciak — Thu, 29 Apr 2004 17:55:01 +0000

There still is a time/space tradeoff of sorts, it's just that time as a function of space is not a monotonically decreasing function.

That's been true for a long time. The difference is that the crossover between negative slope (bigger space == less time) and positive slope (bigger space == more time) keeps moving closer and closer.

One of the benefits of inlining, aside from eliminating the call/return, is that it opens new optimization opportunities by optimizing across the caller/callee boundary. In effect, it allows the called function to be specialized for the context from which it was called. For instance, one of the operands to a function might be a flag that enables/disables some feature controlled by the function. If that flag is a constant in the call, entire codepaths from the callee might become dead code.

It would be interesting to see GCC start specializing functions in this manner without having to inline them, so we keep this secondary benefit while avoiding code bloat. Of course, this is relevant only if GCC can see multiple callers that would benefit from the same specializations. For instance, how many times is kmalloc called with "GFP_KERNEL"? Many. Would an automatic specialization for kmalloc(size, GFP_KERNEL) result in a performance benefit? Possibly.

The cost of inline functions

alspnost — Thu, 29 Apr 2004 13:59:02 +0000

This is presumably the same reasoning behind the frequent observation that -Os optimised binaries are faster than -O2 ones in many cases?

The cost of inline functions

jamesm — Thu, 29 Apr 2004 02:10:02 +0000

It's interesting to also note that the HTTP workload was not affected in a meaningful way.