Cpp-Taskflow: Fast task-based parallel programming using modern C++

TW Huang, CX Lin, G Guo… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
In this paper we introduce Cpp-Taskflow, a new C++ tasking library to help developers
quickly write parallel programs using task dependency graphs. Cpp-Taskflow leverages the …

Achieving high performance on supercomputers with a sequential task-based programming model

E Agullo, O Aumage, M Faverge… - … on Parallel and …, 2017 - ieeexplore.ieee.org
The emergence of accelerators as standard computing resources on supercomputers and
the subsequent architectural complexity increase revived the need for high-level parallel …

Cpp-taskflow: A general-purpose parallel task programming system at scale

TW Huang, Y Lin, CX Lin, G Guo… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
This article introduces Cpp-Taskflow, a high-performance parallel task programming system,
to streamline the building of large and complex parallel applications. Cpp-Taskflow …

Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

X Lacoste, M Faverge, G Bosilca… - … Parallel & Distributed …, 2014 - ieeexplore.ieee.org
The ongoing hardware evolution exhibits an escalation in the number, as well as in the
heterogeneity, of computing resources. The pressure to maintain reasonable levels of …

Parallel programming models for dense linear algebra on heterogeneous systems

J Dongarra, M Abalenkovs, A Abdelfattah… - Supercomputing …, 2015 - superfri.susu.ru
We present a review of the current best practices in parallel programming models for dense
linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand …

Porting the PLASMA numerical library to the OpenMP standard

A YarKhan, J Kurzak, P Luszczek… - International Journal of …, 2017 - Springer
PLASMA is a numerical library intended as a successor to LAPACK for solving problems in
dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for …

Improving performance of GMRES by reducing communication and pipelining global collectives

I Yamazaki, M Hoemmen, P Luszczek… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
We compare the performance of pipelined and s-step GMRES, respectively referred to as l-
GMRES and s-GMRES, on distributed multicore CPUs. Compared to standard GMRES, s …

[PDF][PDF] On runtime systems for task-based programming on heterogeneous platforms

S Thibault - 2018 - inria.hal.science
SIMULATION has become pervasive in science. Real experimentation remains an essential
step in scientific research, but simulation replaced a wide range of costly and lengthy or …

HPC Programming on Intel Many‐Integrated‐Core Hardware with MAGMA Port to Xeon Phi

J Dongarra, M Gates, A Haidar, Y Jia… - Scientific …, 2015 - Wiley Online Library
This paper presents the design and implementation of several fundamental dense linear
algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we …

Unified development for mixed multi-gpu and multi-coprocessor environments using a lightweight runtime environment

A Haidar, C Cao, A Yarkhan, P Luszczek… - 2014 IEEE 28th …, 2014 - ieeexplore.ieee.org
Many of the heterogeneous resources available to modern computers are designed for
different workloads. In order to efficiently use GPU resources, the workload must have a …