Need for speed: Experiences building a trustworthy system-level gpu simulator
The demands of high-performance computing (HPC) and machine learning (ML) workloads
have resulted in the rapid architectural evolution of GPUs over the last decade. The growing …
have resulted in the rapid architectural evolution of GPUs over the last decade. The growing …
Spandex: A flexible interface for efficient heterogeneous coherence
Recent heterogeneous architectures have trended toward tighter integration and shared
memory largely due to the efficient communication and programmability enabled by this …
memory largely due to the efficient communication and programmability enabled by this …
Hmg: Extending cache coherence protocols across modern hierarchical multi-gpu systems
Prior work on GPU cache coherence has shown that simple hardware-or software-based
protocols can be more than sufficient. However, in recent years, features such as multi-chip …
protocols can be more than sufficient. However, in recent years, features such as multi-chip …
Analysis and modeling of collaborative execution strategies for heterogeneous CPU-FPGA architectures
Heterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs
and FPGAs for improved performance and energy efficiency. At the same time …
and FPGAs for improved performance and energy efficiency. At the same time …
Enabling reproducible and agile full-system simulation
Running experiments in modern computer architecture simulators can be a difficult and error-
prone endeavor. Users must track many configurations, components and outputs between …
prone endeavor. Users must track many configurations, components and outputs between …
Altis: Modernizing gpgpu benchmarks
This paper presents ALTIS, a benchmark suite for modern GPGPU computing. Previous
benchmark suites such as Rodinia and SHOC have served the research community well, but …
benchmark suites such as Rodinia and SHOC have served the research community well, but …
Deterministic atomic buffering
YH Chou, C Ng, S Cattell, J Intan… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deterministic execution for GPUs is a desirable property as it helps with debuggability and
reproducibility. It is also important for safety regulations, as safety critical workloads are …
reproducibility. It is also important for safety regulations, as safety critical workloads are …
IGUARD: In-GPU advanced race detection
Newer use cases of GPU (Graphics Processing Unit) computing, eg, graph analytics, look
less like traditional bulk-synchronous GPU programs. To cater to the needs of emerging …
less like traditional bulk-synchronous GPU programs. To cater to the needs of emerging …
A methodology for comparing the reliability of GPU-based and CPU-based HPCs
Today, GPUs are widely used as coprocessors/accelerators in High-Performance
Heterogeneous Computing due to their many advantages. However, many researches …
Heterogeneous Computing due to their many advantages. However, many researches …
Independent forward progress of work-groups
GPUs have evolved from providing highly-constrained programmability for a single kernel to
using pre-emption to ensure independent forward progress for concurrently executing …
using pre-emption to ensure independent forward progress for concurrently executing …