Cpp-Taskflow: Fast task-based parallel programming using modern C++
In this paper we introduce Cpp-Taskflow, a new C++ tasking library to help developers
quickly write parallel programs using task dependency graphs. Cpp-Taskflow leverages the …
quickly write parallel programs using task dependency graphs. Cpp-Taskflow leverages the …
Achieving high performance on supercomputers with a sequential task-based programming model
The emergence of accelerators as standard computing resources on supercomputers and
the subsequent architectural complexity increase revived the need for high-level parallel …
the subsequent architectural complexity increase revived the need for high-level parallel …
Cpp-taskflow: A general-purpose parallel task programming system at scale
This article introduces Cpp-Taskflow, a high-performance parallel task programming system,
to streamline the building of large and complex parallel applications. Cpp-Taskflow …
to streamline the building of large and complex parallel applications. Cpp-Taskflow …
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
The ongoing hardware evolution exhibits an escalation in the number, as well as in the
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …
Parallel programming models for dense linear algebra on heterogeneous systems
We present a review of the current best practices in parallel programming models for dense
linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand …
linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand …
Porting the PLASMA numerical library to the OpenMP standard
PLASMA is a numerical library intended as a successor to LAPACK for solving problems in
dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for …
dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for …
Improving performance of GMRES by reducing communication and pipelining global collectives
We compare the performance of pipelined and s-step GMRES, respectively referred to as l-
GMRES and s-GMRES, on distributed multicore CPUs. Compared to standard GMRES, s …
GMRES and s-GMRES, on distributed multicore CPUs. Compared to standard GMRES, s …
[PDF][PDF] On runtime systems for task-based programming on heterogeneous platforms
S Thibault - 2018 - inria.hal.science
SIMULATION has become pervasive in science. Real experimentation remains an essential
step in scientific research, but simulation replaced a wide range of costly and lengthy or …
step in scientific research, but simulation replaced a wide range of costly and lengthy or …
HPC Programming on Intel Many‐Integrated‐Core Hardware with MAGMA Port to Xeon Phi
This paper presents the design and implementation of several fundamental dense linear
algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we …
algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we …
Unified development for mixed multi-gpu and multi-coprocessor environments using a lightweight runtime environment
Many of the heterogeneous resources available to modern computers are designed for
different workloads. In order to efficiently use GPU resources, the workload must have a …
different workloads. In order to efficiently use GPU resources, the workload must have a …