Rethinking optimization for size
Posted Feb 8, 2013 21:41 UTC (Fri) by khim
In reply to: Rethinking optimization for size
Parent article: Rethinking optimization for size
Do hand performance tuning *only* after a proven need via profiling.
By that time it's often much too late to do anything. The one thing programmer should keep in mind are few numbers. If you introduce nice level of indirection (to facilitate future expansion or something like this) and this indirection triggers access to the main memory then you are losing about 500-600 ticks right there. And contemporary CPU can move hundred of sequential bytes and do thousand of operations in that time! If your program is built around bazillion tiny objects it's too late to anything at this point: to remove these useless levels of indirection you need to basically rewrite program from scratch.
But not as small as you think. I work on a production compiler and we are always tuning for new targets. Even something as simple as moving from Sandy Bridge to Ivy Bridge can result in a change in strategy.
Sure, but how much can you hope to win? I speak from the experience: just recently we've rewrote piece of code - it went from nice "modern" structure with five or six independent layers and couple of dozen structures to one function (autogenerated one) with 20'000 lines of code and dozen of simple local variables. Speedup was about 10x (in one mode was 8x and in another mode was 12x). Do you really believe you can do something like this with a compiler options or small tweaks after profiler run?
P.S. Actually we can squeeze additional ~30% with PGO and some other compiler tricks but in the end we decided that it complicates our build system too much and accepted "mere" 8x/12x speedup.
to post comments)