High-performance gpu-to-cpu transpilation and optimization via high-level parallel constructs

WS Moses, IR Ivanov, J Domke, T Endo… - Proceedings of the 28th …, 2023 - dl.acm.org
While parallelism remains the main source of performance, architectural implementations
and programming models change with each new hardware generation, often leading to …

[PDF][PDF] Towards Unified Analysis of GPU Consistency

H Tong, N Gavrilenko… - 29th ACM …, 2024 - hernanponcedeleon.github.io
After more than 30 years of research, there is a solid understanding of the consistency
guarantees given by CPU systems. Unfortunately, the same is not yet true for GPUs. The …

Towards Alignment of Parallelism in SYCL and ISO C++

SJ Pennycook, B Ashbaugh, J Brodman… - Proceedings of the …, 2023 - dl.acm.org
SYCL began as a C++ abstraction for OpenCL concepts, whereas parallelism in ISO C++
evolved from the algorithms in the standard library. This history has resulted in the two …

Taking back control in an intermediate representation for gpu computing

V Klimis, J Clark, A Baker, D Neto, J Wickerson… - Proceedings of the …, 2023 - dl.acm.org
We describe our experiences successfully applying lightweight formal methods to
substantially improve and reformulate an important part of Standard Portable Intermediate …

Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard Parallelism

TL Cassell, T Deakin, A Alpay… - SC24-W: Workshops …, 2024 - ieeexplore.ieee.org
The Barnes-Hut approximation for N-body simulations reduces the time complexity of the
naive all-pairs approach from O (N 2) to O (N log N) by hierarchically aggregating nearby …

Dynamical. JS: A composable framework for online exploratory visualization of arbitrarily-complex multivariate networks

RL Dotson - 2023 - search.proquest.com
Multivariate networks (henceforth, graphs) represent entities (vertices or nodes), their
relationships to each other (edges), and manifest or derived data about both (attributes) …

[KİTAP][B] Eventify Meets Heterogeneity: Enabling Fine-grained Task-parallelism on GPUs

L Morgenstern - 2024 - juser.fz-juelich.de
Processors become fatter, not faster. For decades, each new transistor generation provided
smaller transistors that could switch faster than ever before. This enabled new processors to …

Supercharging Programming through Compiler Technology

WS Moses - 2023 - dspace.mit.edu
The decline of Moore's law and an increasing reliance on computation has led to an
explosion of specialized software packages and hardware architectures. While this diversity …

[PDF][PDF] Guided rewriting and constraint satisfaction for parallel GPU code generation

N Mogers - 2023 - core.ac.uk
Abstract Graphics Processing Units (GPUs) are notoriously hard to optimise for manually
due to their scheduling and memory hierarchies. What is needed are good automatic code …