Parallel programming models for heterogeneous many-cores: a comprehensive survey
Heterogeneous many-cores are now an integral part of modern computing systems ranging
from embedding systems to supercomputers. While heterogeneous many-core design offers …
from embedding systems to supercomputers. While heterogeneous many-core design offers …
Accelerating AutoDock4 with GPUs and Gradient-Based Local Search
AutoDock4 is a widely used program for docking small molecules to macromolecular targets.
It describes ligand–receptor interactions using a physics-inspired scoring function that has …
It describes ligand–receptor interactions using a physics-inspired scoring function that has …
Evaluating attainable memory bandwidth of parallel programming models via BabelStream
Many scientific codes consist of memory bandwidth bound kernels. One major advantage of
many-core devices such as general purpose graphics processing units (GPGPUs) and the …
many-core devices such as general purpose graphics processing units (GPGPUs) and the …
gpucc: an open-source GPGPU compiler
Graphics Processing Units have emerged as powerful accelerators for massively parallel,
numerically intensive workloads. The two dominant software models for these devices are …
numerically intensive workloads. The two dominant software models for these devices are …
High-performance gpu-to-cpu transpilation and optimization via high-level parallel constructs
While parallelism remains the main source of performance, architectural implementations
and programming models change with each new hardware generation, often leading to …
and programming models change with each new hardware generation, often leading to …
HW/SW co-design toolset for customization of exposed datapath processors
Customized processors are an interesting option for implementing software defined radios;
they bring benefits of tailored fixed function hardware while adding new advantages such as …
they bring benefits of tailored fixed function hardware while adding new advantages such as …
Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs
Future HPC leadership computing systems for the United States Department of Energy will
utilize GPUs for acceleration of scientific codes. These systems will utilize GPUs from …
utilize GPUs for acceleration of scientific codes. These systems will utilize GPUs from …
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism
Dynamic parallelism on GPUs simplifies the programming of many classes of applications
that generate paral-lelizable work not known prior to execution. However, modern GPUs …
that generate paral-lelizable work not known prior to execution. However, modern GPUs …
Programming bare-metal accelerators with heterogeneous threading models: a case study of Matrix-3000
J Fang, P Zhang, C Huang, T Tang, K Lu… - Frontiers of Information …, 2023 - Springer
As the hardware industry moves toward using specialized heterogeneous many-core
processors to avoid the effects of the power wall, software developers are finding it hard to …
processors to avoid the effects of the power wall, software developers are finding it hard to …
COX: Exposing CUDA warp-level functions to CPUs
As CUDA becomes the de facto programming language among data parallel applications
such as high-performance computing or machine learning applications, running CUDA on …
such as high-performance computing or machine learning applications, running CUDA on …