matrix multiply optimization
Posted Oct 27, 2007 21:44 UTC (Sat) by giraffedata
In reply to: Memory part 5: What programmers can do
Parent article: Memory part 5: What programmers can do
Only about a factor of 5 is due to memory optimization. The vector instruction optimization is simply applying more CPU horsepower.
What I can't figure out is how the transposition speeds anything up. The article points out that it removes 1000 non-sequential accesses per column from the multiplication loop, but I see that same 1000 non-sequential accesses per column added to the transposition loop.
to post comments)