A survey of direct methods for sparse linear systems
Wilkinson defined a sparse matrix as one with enough zeros that it pays to take advantage of
them. 1 This informal yet practical definition captures the essence of the goal of direct …
them. 1 This informal yet practical definition captures the essence of the goal of direct …
StarPU-MPI: Task programming over clusters of machines enhanced with accelerators
GPUs clusters are becoming widespread HPC platforms. Exploiting them is however
challenging, as this requires two separate paradigms (MPI and CUDA or OpenCL) and …
challenging, as this requires two separate paradigms (MPI and CUDA or OpenCL) and …
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
The ongoing hardware evolution exhibits an escalation in the number, as well as in the
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …
Data partitioning on heterogeneous multicore and multi-GPU systems using functional performance models of data-parallel applications
Z Zhong, V Rychkov… - 2012 IEEE international …, 2012 - ieeexplore.ieee.org
Transition to hybrid CPU/GPU platforms in high performance computing is challenging in the
aspect of efficient utilisation of the heterogeneous hardware and existing optimised software …
aspect of efficient utilisation of the heterogeneous hardware and existing optimised software …
Exploiting symmetry in tensors for high performance: Multiplication with symmetric tensors
Symmetric tensor operations arise in a wide variety of computations. However, the benefits
of exploiting symmetry in order to reduce storage and computation is in conflict with a desire …
of exploiting symmetry in order to reduce storage and computation is in conflict with a desire …
Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC
Take a multicore Digital Signal Processor (DSP) chip designed for cellular base stations and
radio network controllers, add floating-point capabilities to support 4G networks, and out of …
radio network controllers, add floating-point capabilities to support 4G networks, and out of …
Optimizing tensor contractions in ccsd (t) for efficient execution on gpus
Tensor contractions are higher dimensional analogs of matrix multiplications, used in many
computational contexts such as high order models in quantum chemistry, deep learning …
computational contexts such as high order models in quantum chemistry, deep learning …
Scheduling and memory optimizations for sparse direct solver on multi-core/multi-GPU duster systems
X Lacoste - 2015 - theses.hal.science
The ongoing hardware evolution exhibits an escalation in the number, as well as in the
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …
Bridging the gap between performance and bounds of cholesky factorization on heterogeneous platforms
We consider the problem of allocating and scheduling dense linear application on fully
heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the …
heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the …
Improving the user experience of the rCUDA remote GPU virtualization framework
Graphics processing units (GPUs) are being increasingly embraced by the high‐
performance computing community as an effective way to reduce execution time by …
performance computing community as an effective way to reduce execution time by …