Optimizing GPU cache policies for MI workloads

J Alsop, MD Sinclair, S Bharadwaj… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
In recent years, machine intelligence (MI) applications have emerged as a major driver for
the computing industry. Optimizing these workloads is important, but complicated. As …

Horus: A modular GPU emulator framework

AS Elhelw, S Pai - … on Performance Analysis of Systems and …, 2020 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are widely used to run general-purpose computing
workloads. Three approaches currently exist to observe the dynamic behaviour of these …

GPU Performance Acceleration via Intra-Group Sharing TLB

W Huang, Y Du, M Liu - … of the 52nd International Conference on Parallel …, 2023 - dl.acm.org
Unified virtual memory greatly simplifies GPU programming, but it introduces huge address
translation overhead. To reduce this overhead, modern GPUs utilize the translation …

A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU

Z **, J Vetter, J Vetter - Workshop Proceedings of the 51st International …, 2022 - dl.acm.org
Integer sum reduction is a primitive operation commonly used in scientific computing.
Implementing a parallel reduction on a GPU often involves concurrent memory accesses …

A Research Retrospective on AMD's Exascale Computing Journey

GH Loh, MJ Schulte, M Ignatowski… - Proceedings of the 50th …, 2023 - dl.acm.org
The pace of advancement of the top-end supercomputers historically followed an
exponential curve similar to (and driven in part by) Moore's Law. Shortly after hitting the …

GPU Wavefront Splitting for Safety-Critical Systems

A Klashtorny - 2022 - uwspace.uwaterloo.ca
Graphics processing units (GPUs) are compute platforms that are ideal for highly parallel
workloads due to their high degree of hardware parallelism. Parallelism offered by GPUs …