Zorua: A holistic approach to resource virtualization in GPUs

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …

Virtual thread: Maximizing thread-level parallelism beyond GPU scheduling limit

MK Yoon, K Kim, S Lee, WW Ro… - ACM SIGARCH Computer …, 2016 - dl.acm.org
Modern GPUs require tens of thousands of concurrent threads to fully utilize the massive
amount of processing resources. However, thread concurrency in GPUs can be diminished …

Warped-preexecution: A GPU pre-execution approach for improving latency hiding

K Kim, S Lee, MK Yoon, G Koo, WW Ro… - … Symposium on High …, 2016 - ieeexplore.ieee.org
This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

Regless: Just-in-time operand staging for GPUs

J Kloosterman, J Beaumont, DA Jamshidi… - Proceedings of the 50th …, 2017 - dl.acm.org
The register file is one of the largest and most power-hungry structures in a Graphics
Processing Unit (GPU), because massive multithreading requires all the register state for …

Hierarchical register file at a graphics processing unit

Y Eckert, N Jayasena - US Patent 10,853,904, 2020 - Google Patents
A processor employs a hierarchical register file for a graph ics processing unit (GPU). A top
level of the hierarchical register file is stored at a local memory of the GPU (eg, a memory on …

Many-BSP: an analytical performance model for CUDA kernels

A Riahi, A Savadi, M Naghibzadeh - Computing, 2024 - Springer
The unknown behavior of GPUs and the differing characteristics among their generations
present a serious challenge in the analysis and optimization of programs in these …

A stall-aware warp scheduling for dynamically optimizing thread-level parallelism in GPGPUs

Y Yu, W **ao, X He, H Guo, Y Wang… - Proceedings of the 29th …, 2015 - dl.acm.org
General-Purpose Graphic Processing Units (GPGPU) have been widely used in high
performance computing as application accelerators due to their massive parallelism and …

Unified on-chip memory allocation for SIMT architecture

AB Hayes, EZ Zhang - Proceedings of the 28th ACM international …, 2014 - dl.acm.org
The popularity of general purpose Graphic Processing Unit (GPU) is largely attributed to the
tremendous concurrency enabled by its underlying architecture--single instruction multiple …

Phase aware warp scheduling: Mitigating effects of phase behavior in gpgpu applications

M Awatramani, X Zhu, J Zambreno… - … Conference on Parallel …, 2015 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) have been widely adopted as accelerators for high
performance computing due to the immense amount of computational throughput they offer …

Efficient exception handling support for GPUs

I Tanasic, I Gelado, M Jorda, E Ayguade… - Proceedings of the 50th …, 2017 - dl.acm.org
Operating systems have long relied on the exception handling mechanism to implement
numerous virtual memory features and optimizations. However, today's GPUs have a limited …