On Improving the Memory Access Patterns During The Execution of Strassen's Matrix Multiplication Algorithm

ElGindy, H. and Ferizis, G.

    Matrix multiplication is a basic computing operation. Whereas it is basic, it is also very expensive with a straight forward technique of O(N3) run- time complexity. More complex solutions such as Strassen's algorithm exist that reduce this complexity to O(Nlog2 7); the recursive nature of such algorithms place a large burden on memory systems due to temporary storage and the lack of locality in their access patterns In this paper we propose a scheme for reordering the matrix entries stored in memory. This reordering provides two major benefits: a simple method to transform the recursive algorithm into an iterative one, and also a simple method for maintaining memory locality over the entire operation. These two features both provide an improvement in performance that grows as the problem size increases. The proposed reordering scheme has been implemented in C. Testing of our C implementation, which eliminates the need for unnecessary storage of matrix elements from previous iterations, with matrices of size up-to 2048_2048 exhibits improvement of 27.05% and 8.9% over the original algorithm and another re- ordering scheme respectively.
Cite as: ElGindy, H. and Ferizis, G. (2004). On Improving the Memory Access Patterns During The Execution of Strassen's Matrix Multiplication Algorithm. In Proc. Twenty-Seventh Australasian Computer Science Conference (ACSC2004), Dunedin, New Zealand. CRPIT, 26. Estivill-Castro, V., Ed. ACS. 109-115.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS