> Difference between contemporary CPUs is much smaller then you think.
But not as small as you think. I work on a production compiler and we are always tuning for new targets. Even something as simple as moving from Sandy Bridge to Ivy Bridge can result in a change in strategy.
> What's even worse: they often abuse "premature optimization is the root
> of all evil" mantra to introduce 3x-5x-10x slowdown.
But worse than that is programmers trying to out-guess the compiler and do hand loop unrolling, converting of array accesses to pointer arithmetic and the like. This *kills* the compiler's ability to analyze the program and thus make transformations to improve the code.
It is essential for the developer to take a high-level view of performance. Algorithm and data structure choice is the #1 performance decision to make. After that, the ROI decreases rapidly for the programmer. Yes, the programmer should be aware of the cost of abstractions when appropriate but we should not throw away those abstractions on a whim. They save expensive programmer time. Do hand performance tuning *only* after a proven need via profiling.