### Fast Matrix Multiplication: Implementation and Optimization

#### Henning Biermann Universität des Saarlandes, Saarbrücken August 10, 1996

Abstract

We present and analyze an implementation of Strassen's matrix multiplication algorithm. Our method is twice as fast as the standard multiplication for matrices of size $512 \times 512$.

In 1969 Strassen discovered that it's possible to multiply $n \times n$ matrices in sub-cubic running time $O(n^\log 7)$. Since then, a lot of effort was spend in designing algorithms with better asymptotic running time. The current record $O(n^2.37)$ is due to Winograd and Coppersmith (1990).

It's not clear if any of these advanced methods are of practical use. Therefore, we investigate Strassen's divide and conquer algorithm. We apply a recursive storage scheme to avoid computational overhead for accessing sub-matrices. Also, we use the standard multiplication in combination with loop unrolling to speed up the recursion end.

We give an estimate for the number of computations and evaluate the efficiency of our optimizations. Finally, we compare our constants to experimental results.

Selected References:

• J. Cohen and M. Roth. On the Implementation of Strassen's Fast Multiplication Algorithm. Acta Informatica, Vol. 6, pp. 341-355, 1976.
• D. Coppersmith and S. Winograd. Matrix Multiplication via Arithmetic Progressions. Journal of Symbolic Computation, 9(3), pp. 251-280, March 1990.
• N. J. Higham. Is Fast Matrix Multiplication of Practical Use? SIAM News, pp. 12-14, November 1990.
• V. Pan. How to multiply matrices faster. Lecture Notes in Computer Science, Vol. 179, p. xi + 212, Springer-Verlag Inc., 1984.