it depends on what you define as 'hand optimization'
unrolling loops is something that is almost always better left to the compiler.
But changing the algorithm from using a pointer-heavy set of linked lists to an implementation using small offsets in a buffer is not something the compiler will ever do, but can result is huge speedups.
"trust the compiler, write whatever you want" doesn't work well in real life, and by the time you get things under a real load to find the bottlenecks, it's frequently too late to change them short of a re-write.
You need to keep end efficiency in mind as you go along. This requires that you keep up to date with what sorts of things are cheap to do and what are expensive to do. If you get it right, you are an unsung hero (you seldom get thanks for things that don't crumple under load), if you get it wrong you get ridiculed.