Ac-gc: Lossy activation compression with guaranteed convergence

RD Evans, T Aamodt - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Parallel hardware devices (eg, graphics processor units) have limited high-bandwidth
memory capacity. This negatively impacts the training of deep neural networks (DNNs) by …

Gpuguard: Mitigating contention based side and covert channel attacks on gpus

Q Xu, H Naghibijouybari, S Wang… - Proceedings of the …, 2019 - dl.acm.org
Graphics processing units (GPUs) are moving towards supporting concurrent kernel
execution where multiple kernels may be co-executed on the same GPU and even on the …

Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads

C Avalos Baddouh, M Khairy, RN Green… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Simulating all threads in a scaled GPU workload results in prohibitive simulation cost. Cycle-
level simulation is orders of magnitude slower than native silicon, the only solution is to …

Regless: Just-in-time operand staging for GPUs

J Kloosterman, J Beaumont, DA Jamshidi… - Proceedings of the 50th …, 2017 - dl.acm.org
The register file is one of the largest and most power-hungry structures in a Graphics
Processing Unit (GPU), because massive multithreading requires all the register state for …

DARM: control-flow melding for SIMT thread divergence reduction

C Saumya, K Sundararajah… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group
of threads—wavefront or warp—execute instructions in lockstep. When threads in a group …

A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity

M Khairy, AG Wassal, M Zahran - Journal of Parallel and Distributed …, 2019 - Elsevier
With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …

Combating the reliability challenge of GPU register file at low supply voltage

J Tan, SL Song, K Yan, X Fu, A Marquez… - Proceedings of the 2016 …, 2016 - dl.acm.org
Supply voltage reduction is an effective approach to significantly reduce GPU energy
consumption. As the largest on-chip storage structure, the GPU register file becomes the …

G-scalar: Cost-effective generalized scalar execution architecture for power-efficient gpus

Z Liu, S Gilani, M Annavaram… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
The GPU has provide higher throughput by integrating more execution resources into a
single chip without unduly compromising power efficiency. With the power wall challenge …

HAWS: Accelerating GPU wavefront execution through selective out-of-order execution

X Gong, X Gong, L Yu, D Kaeli - ACM Transactions on Architecture and …, 2019 - dl.acm.org
Graphics Processing Units (GPUs) have become an attractive platform for accelerating
challenging applications on a range of platforms, from High Performance Computing (HPC) …

Relaxations for high-performance message passing on massively parallel SIMT processors

B Klenk, H Fröening, H Eberle… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Accelerators, such as GPUs, have proven to be highly successful in reducing execution time
and power consumption of compute-intensive applications. Even though they are already …