Ac-gc: Lossy activation compression with guaranteed convergence
Parallel hardware devices (eg, graphics processor units) have limited high-bandwidth
memory capacity. This negatively impacts the training of deep neural networks (DNNs) by …
memory capacity. This negatively impacts the training of deep neural networks (DNNs) by …
Gpuguard: Mitigating contention based side and covert channel attacks on gpus
Graphics processing units (GPUs) are moving towards supporting concurrent kernel
execution where multiple kernels may be co-executed on the same GPU and even on the …
execution where multiple kernels may be co-executed on the same GPU and even on the …
Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads
C Avalos Baddouh, M Khairy, RN Green… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Simulating all threads in a scaled GPU workload results in prohibitive simulation cost. Cycle-
level simulation is orders of magnitude slower than native silicon, the only solution is to …
level simulation is orders of magnitude slower than native silicon, the only solution is to …
Regless: Just-in-time operand staging for GPUs
J Kloosterman, J Beaumont, DA Jamshidi… - Proceedings of the 50th …, 2017 - dl.acm.org
The register file is one of the largest and most power-hungry structures in a Graphics
Processing Unit (GPU), because massive multithreading requires all the register state for …
Processing Unit (GPU), because massive multithreading requires all the register state for …
DARM: control-flow melding for SIMT thread divergence reduction
GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group
of threads—wavefront or warp—execute instructions in lockstep. When threads in a group …
of threads—wavefront or warp—execute instructions in lockstep. When threads in a group …
A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity
With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …
Combating the reliability challenge of GPU register file at low supply voltage
Supply voltage reduction is an effective approach to significantly reduce GPU energy
consumption. As the largest on-chip storage structure, the GPU register file becomes the …
consumption. As the largest on-chip storage structure, the GPU register file becomes the …
G-scalar: Cost-effective generalized scalar execution architecture for power-efficient gpus
The GPU has provide higher throughput by integrating more execution resources into a
single chip without unduly compromising power efficiency. With the power wall challenge …
single chip without unduly compromising power efficiency. With the power wall challenge …
HAWS: Accelerating GPU wavefront execution through selective out-of-order execution
Graphics Processing Units (GPUs) have become an attractive platform for accelerating
challenging applications on a range of platforms, from High Performance Computing (HPC) …
challenging applications on a range of platforms, from High Performance Computing (HPC) …
Relaxations for high-performance message passing on massively parallel SIMT processors
Accelerators, such as GPUs, have proven to be highly successful in reducing execution time
and power consumption of compute-intensive applications. Even though they are already …
and power consumption of compute-intensive applications. Even though they are already …