Communication-optimal parallel algorithm for strassen's matrix multiplication

G Ballard, J Demmel, O Holtz, B Lipshitz… - Proceedings of the …, 2012 - dl.acm.org
Parallel matrix multiplication is one of the most studied fundamental problems in distributed
and high performance computing. We obtain a new parallel algorithm that is based on …

Graph expansion and communication costs of fast matrix multiplication

G Ballard, J Demmel, O Holtz, O Schwartz - Journal of the ACM (JACM), 2013 - dl.acm.org
The communication cost of algorithms (also known as I/O-complexity) is shown to be closely
related to the expansion properties of the corresponding computation graphs. We …

Pebbling Game and Alternative Basis for High Performance Matrix Multiplication

O Schwartz, N Vaknin - SIAM Journal on Scientific Computing, 2023 - SIAM
Matrix multiplication is one of the most extensively used kernels in scientific computing.
Although subcubic algorithms exist, most high performance implementations are based on …

Matrix multiplication, a little faster

E Karstadt, O Schwartz - Journal of the ACM (JACM), 2020 - dl.acm.org
Strassen's algorithm (1969) was the first sub-cubic matrix multiplication algorithm. Winograd
(1971) improved the leading coefficient of its complexity from 6 to 7. There have been many …

Multifrontal methods: parallelism, memory usage and numerical aspects

JY L'Excellent - 2012 - theses.hal.science
Direct methods for the solution of sparse systems of linear equations are used in a wide
range of numerical simulation applications. Such methods are based on the decomposition …

A Matrix–Matrix Multiplication methodology for single/multi-core architectures using SIMD

V Kelefouras, A Kritikakou, C Goutis - The Journal of Supercomputing, 2014 - Springer
In this paper, a new methodology for speeding up Matrix–Matrix Multiplication using Single
Instruction Multiple Data unit, at one and more cores having a shared cache, is presented …

A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures

V Kelefouras, A Kritikakou, I Mporas… - The Journal of …, 2016 - Springer
Current compilers cannot generate code that can compete with hand-tuned code in
efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in …

Efficiently parallelizable strassen-based multiplication of a matrix by its transpose

V Arrigoni, F Maggioli, A Massini… - Proceedings of the 50th …, 2021 - dl.acm.org
The multiplication of a matrix by its transpose, ATA, appears as an intermediate operation in
the solution of a wide set of problems. In this paper, we propose a new cache-oblivious …

Stark: Fast and scalable Strassen's matrix multiplication using Apache Spark

C Misra, S Bhattacharya… - IEEE Transactions on Big …, 2020 - ieeexplore.ieee.org
This article presents a new fast, highly scalable distributed matrix multiplication algorithm on
Apache Spark, called Stark, based on Strassen's matrix multiplication algorithm. Stark …

Alternative Basis Matrix Multiplication is fast and stable

O Schwartz, S Toledo, N Vaknin… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Alternative basis matrix multiplication algorithms are the fastest matrix multiplication
algorithms in practice to date. However, are they numerically stable? We obtain the first …