- Academic Search

H Anzt, E Boman, R Falgout… - … of the Royal …, 2020 - royalsocietypublishing.org

Sparse solvers provide essential functionality for a wide variety of scientific applications.
Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi …

Save Cite Cited by 32 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

The kokkos ecosystem: Comprehensive performance portability for high performance computing

C Trott, L Berger-Vergiat, D Poliakoff… - … in Science & …, 2021 - ieeexplore.ieee.org

State-of-the-art engineering and science codes have grown in complexity dramatically over
the last two decades. Application teams have adopted more sophisticated development …

Save Cite Cited by 78 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Kokkos kernels: Performance portable sparse/dense linear algebra and graph kernels

S Rajamanickam, S Acer, L Berger-Vergiat… - ar**
Computational Science and Engineering (CSE) applications depend on performance …

Save Cite Cited by 37 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Software for sparse tensor decomposition on emerging computing architectures

ET Phipps, TG Kolda - SIAM Journal on Scientific Computing, 2019 - SIAM

In this paper, we develop software for decomposing sparse tensors that is portable to and
performant on a variety of multicore, manycore, and GPU computing architectures. The result …

Save Cite Cited by 54 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] nsf.gov

Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs

A Abdelfattah, S Tomov… - 2019 IEEE international …, 2019 - ieeexplore.ieee.org

Matrix multiplication (GEMM) is the most important operation in dense linear algebra.
Because it is a computebound operation that is rich in data reuse, many applications from …

Save Cite Cited by 50 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

Evaluating spatial accelerator architectures with tiled matrix-matrix multiplication

GE Moon, H Kwon, G Jeong… - … on Parallel and …, 2021 - ieeexplore.ieee.org

There is a growing interest in custom spatial accelerators for machine learning applications.
These accelerators employ a spatial array of processing elements (PEs) interacting via …

Save Cite Cited by 29 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] northwestern.edu

Improving scalability of parallel CNN training by adjusting mini-batch size at run-time

S Lee, Q Kang, S Madireddy… - … Conference on Big …, 2019 - ieeexplore.ieee.org

Training Convolutional Neural Network (CNN) is a computationally intensive task, requiring
efficient parallelization to shorten the execution time. Considering the ever-increasing size of …

Save Cite Cited by 34 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] nsf.gov

Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers

A Abdelfattah, P Ghysels, W Boukaram… - … Conference for High …, 2022 - ieeexplore.ieee.org

Many scientific applications rely on sparse direct solvers for their numerical robustness.
However, performance optimization for these solvers remains a challenging task, especially …

Save Cite Cited by 6 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors

S Deshmukh, R Yokota, G Bosilca - ACM Transactions on Mathematical …, 2023 - dl.acm.org

Factorization and multiplication of dense matrices and tensors are critical, yet extremely
expensive pieces of the scientific toolbox. Careful use of low rank approximation can …

Save Cite Cited by 1 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] inspirehep.net

Speeding up particle track reconstruction using a parallel Kalman filter algorithm

S Lantz, K McDermott, M Reid, D Riley… - Journal of …, 2020 - iopscience.iop.org

One of the most computationally challenging problems expected for the High-Luminosity
Large Hadron Collider (HL-LHC) is determining the trajectory of charged particles during …

Save Cite Cited by 25 Related articles All 10 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Designing vector-friendly compact BLAS and LAPACK kernels

Preparing sparse solvers for exascale computing

The kokkos ecosystem: Comprehensive performance portability for high performance computing

Kokkos kernels: Performance portable sparse/dense linear algebra and graph kernels

Software for sparse tensor decomposition on emerging computing architectures

Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs

Evaluating spatial accelerator architectures with tiled matrix-matrix multiplication

Improving scalability of parallel CNN training by adjusting mini-batch size at run-time

Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers

Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors

Speeding up particle track reconstruction using a parallel Kalman filter algorithm