[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …
A survey of CPU-GPU heterogeneous computing techniques
As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …
acknowledged that both of these Processing Units (PUs) have their unique features and …
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
HC Edwards, CR Trott, D Sunderland - Journal of parallel and distributed …, 2014 - Elsevier
The manycore revolution can be characterized by increasing thread counts, decreasing
memory per thread, and diversity of continually evolving manycore architectures. High …
memory per thread, and diversity of continually evolving manycore architectures. High …
Ompss: a proposal for programming heterogeneous multi-core architectures
In this paper, we present OmpSs, a programming model based on OpenMP and StarSs, that
can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on …
can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on …
Parsec: Exploiting heterogeneity to enhance scalability
New high-performance computing system designs with steeply escalating processor and
core counts, burgeoning heterogeneity and accelerators, and increasingly unpredictable …
core counts, burgeoning heterogeneity and accelerators, and increasingly unpredictable …
Scheduling techniques for GPU architectures with processing-in-memory capabilities
Processing data in or near memory (PIM), as opposed to in conventional computational units
in a processor, can greatly alleviate the performance and energy penalties of data transfers …
in a processor, can greatly alleviate the performance and energy penalties of data transfers …
[HTML][HTML] A taxonomy of task-based parallel programming technologies for high-performance computing
Task-based programming models for shared memory—such as Cilk Plus and OpenMP 3—
are well established and documented. However, with the increase in parallel, many-core …
are well established and documented. However, with the increase in parallel, many-core …
DAGuE: A generic distributed DAG engine for high performance computing
The frenetic development of the current architectures places a strain on the current state-of-
the-art programming environments. Harnessing the full potential of such architectures is a …
the-art programming environments. Harnessing the full potential of such architectures is a …
Taskflow: A lightweight parallel and heterogeneous task graph computing system
Taskflow aims to streamline the building of parallel and heterogeneous applications using a
lightweight task graph-based approach. Taskflow introduces an expressive task graph …
lightweight task graph-based approach. Taskflow introduces an expressive task graph …
Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures
Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and
accelerators, like GPUs. Programming such nodes is typically based on a combination of …
accelerators, like GPUs. Programming such nodes is typically based on a combination of …