matrix multiply optimization
Posted Oct 27, 2007 21:44 UTC (Sat) by
giraffedata (subscriber, #1954)
In reply to:
Memory part 5: What programmers can do by Coren
Parent article:
Memory part 5: What programmers can do
Only about a factor of 5 is due to memory optimization. The vector instruction optimization is simply applying more CPU horsepower.
What I can't figure out is how the transposition speeds anything up. The article points out that it removes 1000 non-sequential accesses per column from the multiplication loop, but I see that same 1000 non-sequential accesses per column added to the transposition loop.
(
Log in to post comments)