Need for speed: Experiences building a trustworthy system-level gpu simulator

O Villa, D Lustig, Z Yan, E Bolotin, Y Fu… - … Symposium on High …, 2021 - ieeexplore.ieee.org
The demands of high-performance computing (HPC) and machine learning (ML) workloads
have resulted in the rapid architectural evolution of GPUs over the last decade. The growing …

Spandex: A flexible interface for efficient heterogeneous coherence

J Alsop, M Sinclair, S Adve - 2018 ACM/IEEE 45th Annual …, 2018 - ieeexplore.ieee.org
Recent heterogeneous architectures have trended toward tighter integration and shared
memory largely due to the efficient communication and programmability enabled by this …

Hmg: Extending cache coherence protocols across modern hierarchical multi-gpu systems

X Ren, D Lustig, E Bolotin, A Jaleel… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Prior work on GPU cache coherence has shown that simple hardware-or software-based
protocols can be more than sufficient. However, in recent years, features such as multi-chip …

Analysis and modeling of collaborative execution strategies for heterogeneous CPU-FPGA architectures

S Huang, LW Chang, I El Hajj… - Proceedings of the …, 2019 - dl.acm.org
Heterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs
and FPGAs for improved performance and energy efficiency. At the same time …

Enabling reproducible and agile full-system simulation

BR Bruce, A Akram, H Nguyen, K Roarty… - … Analysis of Systems …, 2021 - ieeexplore.ieee.org
Running experiments in modern computer architecture simulators can be a difficult and error-
prone endeavor. Users must track many configurations, components and outputs between …

Altis: Modernizing gpgpu benchmarks

B Hu, CJ Rossbach - … on Performance Analysis of Systems and …, 2020 - ieeexplore.ieee.org
This paper presents ALTIS, a benchmark suite for modern GPGPU computing. Previous
benchmark suites such as Rodinia and SHOC have served the research community well, but …

Deterministic atomic buffering

YH Chou, C Ng, S Cattell, J Intan… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deterministic execution for GPUs is a desirable property as it helps with debuggability and
reproducibility. It is also important for safety regulations, as safety critical workloads are …

IGUARD: In-GPU advanced race detection

AK Kamath, A Basu - Proceedings of the ACM SIGOPS 28th Symposium …, 2021 - dl.acm.org
Newer use cases of GPU (Graphics Processing Unit) computing, eg, graph analytics, look
less like traditional bulk-synchronous GPU programs. To cater to the needs of emerging …

A methodology for comparing the reliability of GPU-based and CPU-based HPCs

N Cini, G Yalcin - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Today, GPUs are widely used as coprocessors/accelerators in High-Performance
Heterogeneous Computing due to their many advantages. However, many researches …

Independent forward progress of work-groups

A Duţu, MD Sinclair, BM Beckmann… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
GPUs have evolved from providing highly-constrained programmability for a single kernel to
using pre-emption to ensure independent forward progress for concurrently executing …