Its even better if you can get the compiler to do a good job of autovectorising your code. That way it is portable to new architectures. I found this article the other day http://locklessinc.com/articles/vectorize/ . It has a a few examples of how GCC autovectorises, why sometimes it thinks that it can't, and how to convince it that it can.