LWN.net Logo

GCC Explorer - an interactive take on compilation

GCC Explorer - an interactive take on compilation

Posted May 24, 2012 19:48 UTC (Thu) by nybble41 (subscriber, #55106)
In reply to: GCC Explorer - an interactive take on compilation by Richard_J_Neill
Parent article: GCC Explorer - an interactive take on compilation

> Experimentally, GCC is optimising out the add() call entirely, but it isn't optimising the printf. Why one, not the other?

Well, for one thing, the definition of add() is visible in the same source file, while printf() is a library function which is only defined at runtime. For the optimization into puts(), the compiler is assuming you will link with a libc providing standards-compliant printf() and puts() functions.

I'd prefer a proper preprocessor capable of performing this sort of optimization, rather than hard-coding functions like printf() into the compiler. Common Lisp macros can work this way; Calls to (format) which are decipherable at compile-time can be reduced by a macro into a series of low-level output routines, skipping the format string entirely. Unfortunately, for that you need conditionals and loops or recursion; the C preprocessor isn't up to the task. One could possibly do something similar with C++ templates, which are apparently Turing-complete, but it would require more boredom and/or masochism than I'm ever likely to experience.


(Log in to post comments)

GCC Explorer - an interactive take on compilation

Posted May 24, 2012 19:58 UTC (Thu) by Richard_J_Neill (subscriber, #23093) [Link]

That makes some sense as to why it is this way - but still doesn't really explain why it can't be better. Assuming that we do a one-step compile (i.e. gcc knows what's being linked), then the compiler should be able to do better. For that matter, printf() isn't even explicitly linked; it's just defined by "#include stdio.h", without a corresponding "-lstdio".

Runtime linking is a wonderful thing, that reduces the size of code, and allows for shared libraries without dependency hell. But I still contend that if I ask the compiler to do -O3, then it should be able to apply the combined results of thousands of man-years of experience to generate better code than I could. So, in this instance, why not have gcc inline the printf() function, and then optimise it.

GCC Explorer - an interactive take on compilation

Posted May 24, 2012 20:27 UTC (Thu) by khim (subscriber, #9252) [Link]

But I still contend that if I ask the compiler to do -O3, then it should be able to apply the combined results of thousands of man-years of experience to generate better code than I could.

Sorry, but no. When human spends a week or month to generate code this is accepted. When compiler spends measly 1.5 hours people are becoming mightly antsy.

So, in this instance, why not have gcc inline the printf() function, and then optimise it.

Try to take a look on printf implementation one of these days. That's... quite an experience. Computed gotos and function pointers, bazillion tables and locales, I'm not sure even human can optimize it to a single string. No, I take it back: I know human can't do that because it's not possible. Output of %d depends on locale!

Compiler is clever beast, but there are limits for what it can do.

GCC Explorer - an interactive take on compilation

Posted May 24, 2012 20:49 UTC (Thu) by Richard_J_Neill (subscriber, #23093) [Link]

Wow - that's a really complicated beast! But still, the computer is capable of executing the compiled printf function. Forgetting for a moment the problem of locale, then if I write:
printf ("This is: %d\n", 42);
the computer will, at runtime, eventually have the string
"This is: 42\n"
in a buffer somewhere, which then gets sent to stdout.
So, why shouldn't that runtime execution be anticipated at compiletime?
GCC doesn't have to understand printf(), it simply needs to invoke it.

In fact, the preprocessor can do it (via stringify), but that requires significant extra work for the programmer.

[Of course, your point about locales may explain why this doesn't happen... I might never care about my program being run in locations with a different decimal format, nor wish to sacrifice performance for that, but it means that the presumed-deterministic output of printf() is actually dependent on the environment variables.]

GCC Explorer - an interactive take on compilation

Posted May 25, 2012 8:48 UTC (Fri) by ekj (guest, #1524) [Link]

It's not enough to invoke it and notice that it returns "blablah 42\n". To be able to optimise it away, gcc would have to be certain that it -always- returns precisely that. (not just that it did, this time)

GCC Explorer - an interactive take on compilation

Posted May 25, 2012 15:20 UTC (Fri) by jimparis (subscriber, #38647) [Link]

Even more to the point, printf doesn't actually return "blablah 42\n" at all -- it writes it to stdout as a side-effect. So you can't even say "simulate execution and see the result" without knowing that a particular small piece of inline assembly deep inside the libc stdio that was called near the end of printf() execution happens to trigger kernel magic. It's really just not feasible to do this type of optimization in the general case. And it's just impossible in this specific case, as mentioned by khim.

GCC Explorer - an interactive take on compilation

Posted May 28, 2012 13:59 UTC (Mon) by etienne (subscriber, #25256) [Link]

> So, why shouldn't that runtime execution be anticipated at compiletime?

IMHO C++ pushes a lot of things from the .data section (i.e. initialised variables/structures) into runtime execution/initialisation (object constructors), when compared to C.
Moreover some obvious optimisations cannot be done because objects "constant during execution" are still not initialised at compilation time, so compiler cannot do anything about them.
It is as if we would need another step in producing an executable, after linker stage we would run all object constructors which are independant of any input, and copy the resulting memory dump into the ELF file.
Note that my last sentense involves so much complexity that I sometimes better like C, like on this software where I have a grand total of 96 Kbytes for everything, on a 32 bits processor...

GCC Explorer - an interactive take on compilation

Posted May 28, 2012 14:28 UTC (Mon) by jwakely (subscriber, #60262) [Link]

IMHO C++ pushes a lot of things from the .data section (i.e. initialised variables/structures) into runtime execution/initialisation (object constructors), when compared to C.
Do you have anything more convincing than opinion? IMHO C++ puts the same things in .data, when compared to C.

C++ allows you to initialize globals in more ways than C, and those things might need dynamic initialization, but they are things that must also be dynamically initialized in C. Unless you know some way to statically execute an object's constructor in C that can't be done in C++?

Moreover some obvious optimisations cannot be done because objects "constant during execution" are still not initialised at compilation time, so compiler cannot do anything about them.

Have you looked at the C++11 keyword constexpr? It's specifically designed to allow such initialization.

struct Data {
  constexpr Data(int i, int j) : i(i), j(j) { }
  int i;
  int j;
};
constexpr Data data(1, 2);  // statically initialized
This example is clearly pointless (an array of two ints would serve in this case, and be statically initialized in C or C++03) but there are many more things you can do with constexpr.

GCC Explorer - an interactive take on compilation

Posted May 28, 2012 15:56 UTC (Mon) by etienne (subscriber, #25256) [Link]

> IMHO C++ puts the same things in .data, when compared to C.

I do agree that given the same source, you get the same thing in .data.
But when I read other people source code, because they work and think in C++, they will use classes instead of structure/base-types for the most basic things, and define virtual desctructors just in case they need it at a later time, and work with "auto ptr" to manage the list of 80000 parameters in a sorted collection.
I know the C way to manage the parameter "loglevel" as an unsigned integer for debug and as a constant for costumer versions is not as flexible as the C++ way, that because the user is not using a "setter" checking for min/max values he may write incorrect values there, but sometimes I am thinking simpler is better (mostly for such debug parameter).
In short I am not thinking the problem is in the toolchain, but in the mentality of using features just because they exists, of generalising small problems until nothing is constant anymore. For instance writing the line (without even the const keyword) inside or outside a class:
unsigned int bits_per_byte = 8;

That was my humble opinion, I know there is some clever ways those things can be sorted in C++, I have no hope to see this 60 Mbytes source code software cleaned or booting quicker (because everything is constructed at boot).

GCC Explorer - an interactive take on compilation

Posted May 25, 2012 6:59 UTC (Fri) by gughur (guest, #83256) [Link]

That's a good example where a JIT could significantly outperform
a traditional compiler. It could figure out that printf is
being called with the same locale and same format string
all the time and generate special case fast path code for that.

GCC Explorer - an interactive take on compilation

Posted May 25, 2012 10:50 UTC (Fri) by dgm (subscriber, #49227) [Link]

Except that it is not. The locale can change at anytime, so the jitter has to include a test for locale before the "fast" path. Pooof!, your improvement just (almost) went away and you added a whole lot of complexity to the machinery.

Beware of silver bullets.

GCC Explorer - an interactive take on compilation

Posted May 25, 2012 11:50 UTC (Fri) by gioele (subscriber, #61675) [Link]

> Except that it is not. The locale can change at anytime, so the jitter has to include a test for locale before the "fast" path. Pooof!, your improvement just (almost) went away and you added a whole lot of complexity to the machinery.

Except that it is still possible in most cases. For example the JRuby JIT + Java JIT will optimise various paths and functions and keep these optimised versions around until the "environment" changes. In that case it stops the world, throws away the old compiled units and create new ones taking into account the current situation. In practical terms this means that using `eval` works flawlessly but kills your performance.

GCC Explorer - an interactive take on compilation

Posted Jun 13, 2012 8:15 UTC (Wed) by massimiliano (subscriber, #3048) [Link]

The V8 Javascript VM does the same...

GCC Explorer - an interactive take on compilation

Posted May 24, 2012 21:20 UTC (Thu) by gutschke (subscriber, #27910) [Link]

It is relatively easy for GCC and glibc to come up with a new extension. This extension would allow the compiler to cleanly detect whenever it is compiling code that eventually executes in a standard run-time environment.

All we need is a new function attribute to annotate the declaration of the printf() function. For example:

extern int printf(const char * restrict format, ...)
  __attribute__((__format__(__printf__, 1, 2),
                 __runtime_function__(__printf__)));

I do not believe, this is how it is done today. But it would be easy to add to both GCC and glibc. It would allow the compiler to do optimizations, whenever it finds a suitable function declaration in <stdio.h>. But if it builds for a non-standard run-time environment, it would automatically skip doing these optimizations.

Of course, that raises the question whether it should ever try to do these optimizations in the first place. And arguably, the answer is more difficult than a simple "yes" or "no".

For example, it is quite common to have code that uses the same format string multiple times just with different arguments. If GCC always expanded the arguments, it would lose the ability to coalesce the identical format strings. For some programs, the extra space requirements in the .data segment could very well be more painful than the minute cost of computing the output at run-time. In fact, I have a hard time envisioning any program where the cost of printf() is noticeable at all.

GCC Explorer - an interactive take on compilation

Posted May 24, 2012 23:32 UTC (Thu) by Richard_J_Neill (subscriber, #23093) [Link]

> In fact, I have a hard time envisioning any program where the cost of
> printf() is noticeable at all.

I have one. I'm doing data-capture and pre-processing on a relatively slow system (mini-itx, to keep the heat down so as not to cause optical turbulence in the lab). We acquire data at about 1 MHz on 4 channels, and need to do some simple preprocessing (offset and scale) then save to file. The file is a 4-column text-file as asciihex eg "0x1234 0x4567 0x3432 0x6666\n", i.e format string is "0x%x\t0x%x\t0x%x\t0x%x\n". So the printf cost is definitely not negligible, when we call it 1 million times a second.

(Yes, we could optimise further and keep the data in raw binary form, but it's not strictly necessary yet, would prevent us adding headers, and would make for a non-human-friendly file format. But I'd certainly welcome a fast printf for the simple cases)

GCC Explorer - an interactive take on compilation

Posted May 25, 2012 11:13 UTC (Fri) by dgm (subscriber, #49227) [Link]

The think is, the solution in this case is simply to write your own formating function. It's really simple, much simpler than the kind of optimizations you were asking for.

An example:

char * to_hex(unsigned v){
    static char digits[16] = "0123456789ABCDEF";
    static char buffer[8]; /* assuming 32 bits */
    int i;
    for (i = 7; i >= 0; i--) {
        buffer[i] = digits[v & 0xF];
        v >>= 4;
    }
    return buffer;
}

GCC Explorer - an interactive take on compilation

Posted May 25, 2012 14:04 UTC (Fri) by jwakely (subscriber, #60262) [Link]

But do you call printf with compile-time constants a million times a second?

GCC Explorer - an interactive take on compilation

Posted Jul 3, 2012 3:32 UTC (Tue) by Richard_J_Neill (subscriber, #23093) [Link]

Much more significantly, what about an embedded system. On an Arduino, the runtime and storage costs of printf are severe. (typically ~3% of ROM)

So for example, I want to write:

#define OSC_MHZ 25
#define CLOCK_NS (1000/OSC_MHZ)
#define STRINGIGFY(s) STRINGIFY2(s)
#define STRINGIFY2(s) #s

Serial.print ("Clock period is " STRINGIFY(CLOCK_NS) "ns\n");

...which should print: "Clock period is 40 ns\n"
but actually prints: "Clock period is (1000/25 ns\n")

This sort of thing can't be done in the preprocessor and it's too expensive to do at runtime. The only workaround seems to be to actually manually: #define CLOCK_NS_STR "40"
which is a bug just waiting to happen.

GCC Explorer - an interactive take on compilation

Posted May 25, 2012 13:34 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

It is almost exactly how gcc handles printf-like functions... from the gcc info: extern int my_printf(void *my_object, const char *my_format, ...) __attribute__((format(printf, 2, 3))); tells the compiler this function is printf-like. You can also use the attribute const on a function to express that it only uses the values of the arguments to compute its return value, so the compiler knows it can stash away the value and not recompute it.

So yes, the compiler is pretty smart, and the relevant C standard also says that standard functions can't be just overriden by the user. That is how printf("Hello, world!\n"); gets replaced (legally) by puts("Hello, world!\n");. Sure, there are lots of other changes the compiler might legally make, but at some point the extra cost in compiling outweighs the benefits; and some "optimizations" might on average turn out costlier in runtime than just leaving the code as is. As for the next suggestion, "leave it to the programmer via some attribute/special pragma," it has been shown time and again (Jon Bentley's "Writing efficient programs," sadly out of print, has some delightful stories) that programmers are terrible at predicting how to optimize code.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds