Preparing sparse solvers for exascale computing

H Anzt, E Boman, R Falgout… - … of the Royal …, 2020 - royalsocietypublishing.org
Sparse solvers provide essential functionality for a wide variety of scientific applications.
Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi …

The kokkos ecosystem: Comprehensive performance portability for high performance computing

C Trott, L Berger-Vergiat, D Poliakoff… - … in Science & …, 2021 - ieeexplore.ieee.org
State-of-the-art engineering and science codes have grown in complexity dramatically over
the last two decades. Application teams have adopted more sophisticated development …

Software for sparse tensor decomposition on emerging computing architectures

ET Phipps, TG Kolda - SIAM Journal on Scientific Computing, 2019 - SIAM
In this paper, we develop software for decomposing sparse tensors that is portable to and
performant on a variety of multicore, manycore, and GPU computing architectures. The result …

Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs

A Abdelfattah, S Tomov… - 2019 IEEE international …, 2019 - ieeexplore.ieee.org
Matrix multiplication (GEMM) is the most important operation in dense linear algebra.
Because it is a computebound operation that is rich in data reuse, many applications from …

Evaluating spatial accelerator architectures with tiled matrix-matrix multiplication

GE Moon, H Kwon, G Jeong… - … on Parallel and …, 2021 - ieeexplore.ieee.org
There is a growing interest in custom spatial accelerators for machine learning applications.
These accelerators employ a spatial array of processing elements (PEs) interacting via …

Improving scalability of parallel CNN training by adjusting mini-batch size at run-time

S Lee, Q Kang, S Madireddy… - … Conference on Big …, 2019 - ieeexplore.ieee.org
Training Convolutional Neural Network (CNN) is a computationally intensive task, requiring
efficient parallelization to shorten the execution time. Considering the ever-increasing size of …

Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers

A Abdelfattah, P Ghysels, W Boukaram… - … Conference for High …, 2022 - ieeexplore.ieee.org
Many scientific applications rely on sparse direct solvers for their numerical robustness.
However, performance optimization for these solvers remains a challenging task, especially …

Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors

S Deshmukh, R Yokota, G Bosilca - ACM Transactions on Mathematical …, 2023 - dl.acm.org
Factorization and multiplication of dense matrices and tensors are critical, yet extremely
expensive pieces of the scientific toolbox. Careful use of low rank approximation can …

Speeding up particle track reconstruction using a parallel Kalman filter algorithm

S Lantz, K McDermott, M Reid, D Riley… - Journal of …, 2020 - iopscience.iop.org
One of the most computationally challenging problems expected for the High-Luminosity
Large Hadron Collider (HL-LHC) is determining the trajectory of charged particles during …