[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives

B Peccerillo, M Mannino, A Mondelli… - Journal of Systems …, 2022 - Elsevier
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …

A survey of CPU-GPU heterogeneous computing techniques

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2015 - dl.acm.org
As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

HC Edwards, CR Trott, D Sunderland - Journal of parallel and distributed …, 2014 - Elsevier
The manycore revolution can be characterized by increasing thread counts, decreasing
memory per thread, and diversity of continually evolving manycore architectures. High …

Ompss: a proposal for programming heterogeneous multi-core architectures

A Duran, E Ayguadé, RM Badia, J Labarta… - Parallel processing …, 2011 - World Scientific
In this paper, we present OmpSs, a programming model based on OpenMP and StarSs, that
can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on …

Parsec: Exploiting heterogeneity to enhance scalability

G Bosilca, A Bouteiller, A Danalis… - … in Science & …, 2013 - ieeexplore.ieee.org
New high-performance computing system designs with steeply escalating processor and
core counts, burgeoning heterogeneity and accelerators, and increasingly unpredictable …

Scheduling techniques for GPU architectures with processing-in-memory capabilities

A Pattnaik, X Tang, A Jog, O Kayiran… - Proceedings of the …, 2016 - dl.acm.org
Processing data in or near memory (PIM), as opposed to in conventional computational units
in a processor, can greatly alleviate the performance and energy penalties of data transfers …

[HTML][HTML] A taxonomy of task-based parallel programming technologies for high-performance computing

P Thoman, K Dichev, T Heller, R Iakymchuk… - The Journal of …, 2018 - Springer
Task-based programming models for shared memory—such as Cilk Plus and OpenMP 3—
are well established and documented. However, with the increase in parallel, many-core …

DAGuE: A generic distributed DAG engine for high performance computing

G Bosilca, A Bouteiller, A Danalis, T Herault… - Parallel Computing, 2012 - Elsevier
The frenetic development of the current architectures places a strain on the current state-of-
the-art programming environments. Harnessing the full potential of such architectures is a …

Taskflow: A lightweight parallel and heterogeneous task graph computing system

TW Huang, DL Lin, CX Lin, Y Lin - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Taskflow aims to streamline the building of parallel and heterogeneous applications using a
lightweight task graph-based approach. Taskflow introduces an expressive task graph …

Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures

T Gautier, JVF Lima, N Maillard… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and
accelerators, like GPUs. Programming such nodes is typically based on a combination of …