A survey of direct methods for sparse linear systems

TA Davis, S Rajamanickam, WM Sid-Lakhdar - Acta Numerica, 2016 - cambridge.org
Wilkinson defined a sparse matrix as one with enough zeros that it pays to take advantage of
them. 1 This informal yet practical definition captures the essence of the goal of direct …

StarPU-MPI: Task programming over clusters of machines enhanced with accelerators

C Augonnet, O Aumage, N Furmento, R Namyst… - Recent Advances in the …, 2012 - Springer
GPUs clusters are becoming widespread HPC platforms. Exploiting them is however
challenging, as this requires two separate paradigms (MPI and CUDA or OpenCL) and …

Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

X Lacoste, M Faverge, G Bosilca… - … Parallel & Distributed …, 2014 - ieeexplore.ieee.org
The ongoing hardware evolution exhibits an escalation in the number, as well as in the
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …

Data partitioning on heterogeneous multicore and multi-GPU systems using functional performance models of data-parallel applications

Z Zhong, V Rychkov… - 2012 IEEE international …, 2012 - ieeexplore.ieee.org
Transition to hybrid CPU/GPU platforms in high performance computing is challenging in the
aspect of efficient utilisation of the heterogeneous hardware and existing optimised software …

Exploiting symmetry in tensors for high performance: Multiplication with symmetric tensors

MD Schatz, TM Low, RA van de Geijn, TG Kolda - SIAM Journal on Scientific …, 2014 - SIAM
Symmetric tensor operations arise in a wide variety of computations. However, the benefits
of exploiting symmetry in order to reduce storage and computation is in conflict with a desire …

Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC

FD Igual, M Ali, A Friedmann, E Stotzer… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
Take a multicore Digital Signal Processor (DSP) chip designed for cellular base stations and
radio network controllers, add floating-point capabilities to support 4G networks, and out of …

Optimizing tensor contractions in ccsd (t) for efficient execution on gpus

J Kim, A Sukumaran-Rajam, C Hong… - Proceedings of the …, 2018 - dl.acm.org
Tensor contractions are higher dimensional analogs of matrix multiplications, used in many
computational contexts such as high order models in quantum chemistry, deep learning …

Scheduling and memory optimizations for sparse direct solver on multi-core/multi-GPU duster systems

X Lacoste - 2015 - theses.hal.science
The ongoing hardware evolution exhibits an escalation in the number, as well as in the
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …

Bridging the gap between performance and bounds of cholesky factorization on heterogeneous platforms

E Agullo, O Beaumont, L Eyraud-Dubois… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
We consider the problem of allocating and scheduling dense linear application on fully
heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the …

Improving the user experience of the rCUDA remote GPU virtualization framework

C Reano, F Silla, A Castelló, AJ Pena… - Concurrency and …, 2015 - Wiley Online Library
Graphics processing units (GPUs) are being increasingly embraced by the high‐
performance computing community as an effective way to reduce execution time by …