Rethinking optimization for size
Posted Feb 1, 2013 21:02 UTC (Fri) by khim
In reply to: Rethinking optimization for size
Parent article: Rethinking optimization for size
It won't, for example, turn on or off loop unrolling depending on how memory-bound a particular CPU tends to be.
It's impossible to predict unless you know what your program is doing - and compiler deals with functions. Even puny Atoms have 32KiB of L1 cache where functions (even with loops unrolled) are usually smaller.
If the compiler could be just a bit smarter, that would be a big win.
This way lies madness. Difference between contemporary CPUs is much smaller then you think. The aforementioned cache which presumably should be handled differently in different cases is between 32KiB on most Intel CPUs (from Atoms to XeonsCore and 64KiB for AMD, L2 differs more substantially but difference is not large enough to affect issues at small (function-sized) scale and LTO is not yet in wide use and not all that mature besides.
But most programmers don't want to spend time testing each optimization separately; they tend to just pick -O2 and let the compiler decide.
It's even worse: they often abuse "premature optimization is the root of all evil" mantra to introduce 3x-5x-10x slowdown. What the compiler does after that is more-or-less irrelevant. If people care about efficiency they should start thinking about efficiency first and hot hope that compiler will magically make 10 levels of indirections disappear. No, they don't disappear and compiler very rarely can do anything to thembut looks like developers (except for kernel developers, of course) understand that. Instead most books explain how you can use them to nicely "encapsulate" and "separate" stuff - usually without ever mentioning their price.
to post comments)