> Which are not applicable at all if you have arrays of various complex
How did you come to that conclusion? Sometimes it is not worth it but for smallish structs it can be a win. Complex numbers are a good example.
> any function call tend to break all these nice techniques
If the function call is still there and even then the compiler can sometimes accomplish the task, depending on how aggressive the developers were.
> Of course nowadays it's often better to use CUDA or OpenCL to push all
> that to the GPU
No. There is never a case where CUDA or OpenCL are a good idea. Only poor compilers have led us down that path. Look at what's being done with OpenACC. A good compiler can match CUDA performance pretty easily and can often dramatically outperform it.
> Add one additional argument - and spill is inevitable.
In the grand scheme of thing, one write to cache/read from cache isn't usually critical. Sure, there are cases where this kind of optimization is very important -- if the callback is called in a tight loop for example. But in that case a better solution is often to refactor the code and/or rework the data structures. It should not be necessary to hand-linearize array accesses to get performance.
> You may say that code which calls a callback in a tight loop is hopeless
> in a first place
Actually, I would say that's often good design for the reasons you give. But then this kind of code usually isn't dealing with numerical arrays or arrays of very small structs which might be well vectorized. I suppose that is what you were thinking of with your first statement.
If that's the kind of code you're concerned about then yes, hand-linearizing array accesses is probably not going to hurt much. But it won't really help either. I've no problem with a loop that iterates over such arrays using a single pointer. I am more concerned about people who take multidimensional arrays and translate accesses to complex pointer arithmetic.
Still, if the objects are small enough, vectorization targeting efficient load/store of the data can often be a win, even if the actual data manipulation isn't vectorized.