LWN.net Logo

Who is the best inliner of all?

By Jonathan Corbet
January 14, 2009
The inline keyword provided by GCC has always been a bit of a dangerous temptation for kernel programmers. In many cases, making a function inline can help performance. In some, it is mandatory; this is especially true for functions which encapsulate specific CPU instructions. But, in other cases, inlining becomes a classic example of premature optimization; at best, it does not help, while, at worst, it can significantly bloat the size of the kernel and harm performance. Since performance matters to kernel developers, the proper way of inlining functions has often been a topic of discussion. The most recent debate on the subject has made it clear, though, that there is still no real consensus on the issue.

The discussion began as an offshoot of the spinning mutex topic when Linus noticed that a posted kernel oops listing showed that the __cmpxchg() function had not been inlined. This function provides access to the x86 cmpxchg* instructions; it should expand to a single instruction. Clearly it makes sense to inline a single-instruction function, but, for whatever reason, GCC had decided not to do that.

Linus quickly concluded that the fault lies with the (non-default) CONFIG_OPTIMIZE_INLINING configuration option. This option, when selected, makes inline into a suggestion which GCC is free to ignore. At that point, GCC makes its own decisions, based on a set of built-in heuristics. In this case, it decided that __cmpxchg() was too complex to inline, so it made it into a separate function. Linus, in disgust, asked Ingo Molnar to remove CONFIG_OPTIMIZE_INLINING and force the compiler to honor the inline keyword.

Some other developers agreed with this request - but not all. GCC will still certainly make mistakes, but there is also a growing feeling that, with more recent versions of the compiler, GCC is able to make good decisions most of the time. If GCC is also given the power to inline functions which have not been explicitly marked by the developer, the results can be even better. There are hazards, though, to giving GCC an overly free hand: excessive inlining can create stack usage problems and make debugging harder. But these are problems that some developers are willing to accept if the benefits are strong enough.

Ingo ran a long series of tests to see what happens when GCC is given free rein over the inlining of functions. His results were fairly clear: recent GCC, when allowed to make its own inlining decisions, produces a kernel that is 1-7% smaller than the kernel which results from strictly following inline declarations. From that data, Ingo concludes that the best solution is to use the inlining features built into the compiler:

Today we have in excess of thirty thousand 'inline' keyword uses in the kernel, and in excess of one hundred thousand kernel functions. We had a decade of hundreds of inline-tuning patches that flipped inline attributes on and off, with the goal of doing that job better than the compiler.

Still a sucky compiler who was never faced with this level of inlining complexity before (up to a few short months ago when we released the first kernel with non-CONFIG_BROKEN-marked CONFIG_OPTIMIZE_INLINING feature in it) manages to do a better job at judging inlining than a decade of human optimizations managed to do. (If you accept that 1% - 3% - 7.5% code size reduction in important areas of the kernel is an improvement.)

Linus, however, is unimpressed. In his point of view, the kernel size reduction provided by automated inlining does not outweigh the drawbacks:

It's not about size - or necessarily even performance - at all. It's about abstraction, and a way of writing code.

And the thing is, as long as gcc does what we ask, we can notice when _we_ did something wrong. We can say "ok, we should just remove the inline" etc. But when gcc then essentially flips a coin, and inlines things we don't want to, it dilutes the whole value of inlining - because now gcc does things that actually does hurt us.

We get oopses that have a nice symbolic back-trace, and it reports an error IN TOTALLY THE WRONG FUNCTION, because gcc "helpfully" inlined things to the point that only an expert can realize "oh, the bug was actually five hundred lines up, in that other function that was just called once, so gcc inlined it even though it is huge".

See? THIS is the problem with gcc heuristics. It's not about quality of code, it's about RELIABILITY of code.

The reason people use C for system programming is because the language is a reasonably portable way to get the expected end results WITHOUT the compiler making a lot of semantic changes behind your back.

Linus would rather that the inline keyword be considered mandatory by the compiler. Then, if there are too many inline functions in the kernel (and 30,000 of them does seem like a fairly high number), the unnecessary inline keywords should be removed. There was some talk of adding some sort of inline_hint keyword for cases where inlining is just a suggestion, but there is not much enthusiasm for that approach.

The problem with the all-manual approach - even assuming that it can yield the best results - was perhaps best expressed by Ingo:

In this cycle alone, in the past ~2 weeks we added another 1300 inlines to the kernel. Do we really want periodic postings of:

[PATCH 0/135] inline removal cleanups

... in the next 10 years? We have about 20% of all functions in the kernel marked with 'inline'. It is a _very_ strong habit. Is it worth fighting against it?

Solving excessive use of inline functions by diluting the meaning of the inline keyword may look like a misdirected solution. But the alternative would require much more attentive review of kernel patches before they go into the mainline. History suggests that getting that level of review is an uphill battle at best. History also shows that compilers tend to be better than programmers at making this kind of decision, especially when behavior over an entire body of code (as opposed to in a single function) is considered. But it may be a while, yet, before the development community as a whole is willing to put that level of trust into its tools.


(Log in to post comments)

Who is the best inliner of all?

Posted Jan 15, 2009 9:47 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

> We get oopses that have a nice symbolic back-trace, and it reports an error IN TOTALLY THE WRONG FUNCTION, because gcc "helpfully" inlined things to the point that only an expert can realize "oh, the bug was actually five hundred lines up, in that other function that was just called once, so gcc inlined it even though it is huge".

Smarter debug symbols/information might also help.

Who is the best inliner of all?

Posted Jan 19, 2009 1:00 UTC (Mon) by i3839 (guest, #31386) [Link]

No, that won't help. Because all the information already exists in the debugging info, but loading all that debugging info (like line numbers) in a running kernel would waste way too much memory. Much more than is saved by aggressive inlining.

In this particular case it was about static functions with one caller, which in general makes perfect sense to inline, but not for the kernel, because when making a backtrace the kernel only has access to the information loaded in memory at that time.

Who is the best inliner of all?

Posted Jan 19, 2009 8:40 UTC (Mon) by michaeljt (subscriber, #39183) [Link]

Why does the debug information have to be in memory at all when the kernel is running? The symbol resolution in the stack trace is not needed until the report is sent off, so it can be delayed until then.

Who is the best inliner of all?

Posted Jan 19, 2009 9:12 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

for the symbol resolution to be done later, someone will have to manually transcribe the oops message (most of the time), when the kernel knows that it's in bad shape you really don't want to have it writing to a disk (it may end up writing over your data)

Linus' statement about digital cameras being are more useful than crash dumps for kernel debugging is becouse it's easy to take a picture of an oops and send it out, it's much more work (and therefor fewer people bother) to gather the oops in other ways.

Who is the best inliner of all?

Posted Jan 23, 2009 22:41 UTC (Fri) by oak (guest, #2786) [Link]

> for the symbol resolution to be done later, someone will have to
> manually transcribe the oops message (most of the time), when
> the kernel knows that it's in bad shape you really don't want
> to have it writing to a disk (it may end up writing over your data)

Maybe on production systems, but on test systems you really do want those
oopses stored automatically somewhere (separate disk partition without
filesystem has worked without problems for over a year).

If disk is too risky for Oops information, send them to serial or over
network and have at the other end something that automatically resolves
the oopses properly with kernel debug symbols (which you've separated from
the kernel binary after it's been built).

Who is the best inliner of all?

Posted Jan 20, 2009 1:31 UTC (Tue) by i3839 (guest, #31386) [Link]

Besides the reasons given by dlang, another one is that doing that is only possible if you have the debugging info of the running kernel, which might not always be the case (either for the user or the person receiving a raw oops).

And finally, it may not be strictly necessary and you could ask people to decipher cryptic messages one way or the other, but debugging the kernel should be as easy as possible. If you want to move the symbol resolution to user space to save kernel memory, you better also move all printk strings to user space as well, as that saves a lot of memory as well.

Who is the best inliner of all?

Posted Jan 15, 2009 12:11 UTC (Thu) by etienne_lorrain@yahoo.fr (guest, #38022) [Link]

Inlining may also depend on the processor you are compiling for, the more available registers there is, the more efficient it is to inline functions.
Do we want ia64_inline and ia32_inline defines?
Inlining does not only remove call/return costs, but also enable optimisations when arguments are constants, and garanties that some external variable are not modified by the function call - so that there is no need to update the memory before the call, and reload all external variables into registers after the call.
Because most source have been written and optimised for ia32 (few registers), previous optimisations have not been considered to worth the effort.

Who is the best inliner of all?

Posted Jan 24, 2009 6:40 UTC (Sat) by HalfMoon (guest, #3211) [Link]

Inlining may also depend on the processor you are compiling for, the more available registers there is, the more efficient it is to inline functions.

Also, how good the optimizer is. Even relatively recent versions of GCC seem to have a hard time understanding how, for example, to turn register access functions into single ARM instructions ... unless you hit them over the head with an inline annotation. I've shrunk drivers' I-space footprint from between ten and twenty percent, in some cases, by simple tricks like that.

Contrariwise, sometimes discrete copies of functions are better.

GCC isn't actually known for good inlining, and something that works well on x86 (or, one particular flavor of x86) will sometimes really hurt other processors.

Who is the best inliner of all?

Posted Jan 15, 2009 14:03 UTC (Thu) by kpfleming (subscriber, #23250) [Link]

In addition, as I've been playing with GCC 4.3's --combine and -fwhole-program options lately over various code bases, I've found two things:

- GCC does an amazing job of poring over a complete 'program' and optimizing it when given the chance. Most programs (for perfectly valid reasons) are broken up into many source files for ease of maintenance, but this removes a large number of optimization opportunities. In the kernel, this means that the only functions that will ever be inlined are those defined in header files, so in a subsystem or driver that consists of 20+ sources files, when 50% of the functions in those files have only one or two call sites, they still don't get inlined.

- Allowing more aggressive optimization has actually found real bugs in some of the code bases I've been working on, as the compiler has been able to see inside called functions and then report useful things like uninitialized variable usage that it could not do before.

Who is the best inliner of all?

Posted Jan 15, 2009 14:56 UTC (Thu) by dwmw2 (subscriber, #2063) [Link]

"GCC does an amazing job of poring over a complete 'program' and optimizing it when given the chance. Most programs (for perfectly valid reasons) are broken up into many source files for ease of maintenance, but this removes a large number of optimization opportunities."
For the Linux kernel, this is especially true in file system code, I believe. At http://lwn.net/Articles/197097/ there is a reference to some work I did a while back on building the kernel with -fwhole-program --combine.

I should dig that out again.

"- Allowing more aggressive optimization has actually found real bugs in some of the code bases I've been working on, as the compiler has been able to see inside called functions and then report useful things like uninitialized variable usage that it could not do before."
Shows up a few compiler bugs too...

Who is the best inliner of all?

Posted Jan 16, 2009 19:16 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

In the kernel, this means that the only functions that will ever be inlined are those defined in header files,

What about the the functions in the same .c file as their callers? There are plenty of those in the kernel.

Does -combine totally combine, like #including all the files into a master file, or does it keep them properly modular? I hate having inline functions in header files because the problems of interference between the header file and the file into which it is #included limits what the inline function can do. But if -combine solves that problem, I can see having usable source code libraries.

Who is the best inliner of all?

Posted Jan 16, 2009 21:02 UTC (Fri) by kpfleming (subscriber, #23250) [Link]

Sorry, I was unclear there: I should have said that the only functions *except those in the same source file*.

Regarding -combine, essentially what happens is that each source file is run through cpp separately, then the results are concatenated and fed to the compiler. This avoids preprocessor-level conflicts.

However, keep in mind that -combine alone, while it does provide some benefit, doesn't accomplish most of the desired result. -fwhole-program does that, because when there are multiple source modules, most of the functions will *not* be 'static', as they are cross-module references, and this can interfere with optimization and (especially) inlining. -fwhole-program overrides this, and forces everything to be 'static' scope except those items that are marked with the 'externally_visible' attribute. To gain the most benefits, you need to use both.

Who is the best inliner of all?

Posted Jan 17, 2009 22:27 UTC (Sat) by jreiser (subscriber, #11027) [Link]

How much of the fuss is due to not having or not using appropriate measurement tools? It seems to me that there might be a small number of cases (say, 10) where inline is required for functional correctness. In the other 29990 cases, #define inline /*empty*/ should work, and the justification for an actual inline should be a measurement of the increase in speed, or decrease in size. The measurement should be documented enough so that it can be repeated and verified (say, once per year) as compilers, machines, and usage changes.

Kernel, meet gcc

Posted Jan 18, 2009 13:01 UTC (Sun) by rwmj (subscriber, #5474) [Link]

I find it strange that there's this "wall" between the kernel developers and the gcc developers. If
there are problems with gcc's inlining, why don't the kernel developers submit patches to gcc to fix
them (or at least post test cases so the gcc devs can take a look).

Kernel, meet gcc

Posted Jan 19, 2009 1:44 UTC (Mon) by i3839 (guest, #31386) [Link]

At least two reasons:

- The kernel is special, not at all like most other code, so gcc behaviour that is considered wrong for the kernel is in other cases perfectly fine.

- The kernel supports all kind of gcc versions, so fixing something in newer versions isn't enough. The problem is worked around one way or the other, reducing the need to change gcc.

Other reason is that optimising is a very difficult problem, and cases where gcc does the wrong thing are in general not easy to fix, because it needs a lot of restructuring.

As the posted numbers have shown, gcc almost always does the right thing. Unfortunately, doing it wrong only for a couple functions can be quite bad.

Kernel, meet gcc

Posted Jan 19, 2009 3:53 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

in part because the gcc guys respond to the bug reports with claims that gcc is working as designed and there is nothing to fix.

Kernel, meet gcc

Posted Jan 20, 2009 4:36 UTC (Tue) by dvdeug (subscriber, #10998) [Link]

Which seems to be a pretty common reaction of kernel people, too, when people want to ignore what Posix says and what standard usage is, in exchange for "Do (what I think is) The Right Thing (for me)".

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds