User: Password:
|
|
Subscribe / Log in / New account

matrix multiply optimization

matrix multiply optimization

Posted Oct 27, 2007 21:44 UTC (Sat) by giraffedata (subscriber, #1954)
In reply to: Memory part 5: What programmers can do by Coren
Parent article: Memory part 5: What programmers can do

Only about a factor of 5 is due to memory optimization. The vector instruction optimization is simply applying more CPU horsepower.

What I can't figure out is how the transposition speeds anything up. The article points out that it removes 1000 non-sequential accesses per column from the multiplication loop, but I see that same 1000 non-sequential accesses per column added to the transposition loop.


(Log in to post comments)

matrix multiply optimization

Posted Oct 27, 2007 21:55 UTC (Sat) by bartoldeman (guest, #4205) [Link]

Because transposition uses O(N^2) accesses and multiplication O(N^3). The accesses in the
transposition are more expensive but there are N times fewer than in the multiplication...

matrix multiply optimization

Posted Oct 27, 2007 22:41 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

Aha. Perfectly clear now. The article neglects to explain this; I'd probably say, "the original traverses mul2 in this expensive nonsequential way 1000 times, whereas the improved version does it only once."


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds