A survey of techniques for architecting and managing GPU register file

S Mittal - IEEE Transactions on Parallel and Distributed …, 2016 - ieeexplore.ieee.org
To support their massively-multithreaded architecture, GPUs use very large register file (RF)
which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs …

Regless: Just-in-time operand staging for GPUs

J Kloosterman, J Beaumont, DA Jamshidi… - Proceedings of the 50th …, 2017 - dl.acm.org
The register file is one of the largest and most power-hungry structures in a Graphics
Processing Unit (GPU), because massive multithreading requires all the register state for …

A reschedulable dataflow-simd execution for increased utilization in cgra cross-domain acceleration

C Yin, N **g, J Jiang, Q Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-
domain acceleration, control flow and memory accesses often degrade the processing …

Cache-emulated register file: An integrated on-chip memory architecture for high performance GPGPUs

N **g, J Wang, F Fan, W Yu, L Jiang… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
The on-chip memory design is critical to the GPGPU performance because it serves
between the massive threads and the huge external memory as a low-latency and high …

Particle swarm optimization protocol for clustering in wireless sensor networks: A realistic approach

RS Elhabyan, MCE Yagoub - Proceedings of the 2014 IEEE …, 2014 - ieeexplore.ieee.org
In Wireless Sensor Network (WSN), Clustering sensor nodes is an efficient topology control
method to reduce energy consumption of the sensor nodes. Many link quality-based …

Towards warp-scheduler friendly STT-RAM/SRAM hybrid GPGPU register file design

Q Deng, Y Zhang, M Zhang… - 2017 IEEE/ACM …, 2017 - ieeexplore.ieee.org
Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF)
to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power …

Bank stealing for a compact and efficient register file architecture in GPGPU

N **g, S Jiang, S Chen, J Zhang… - … Transactions on Very …, 2016 - ieeexplore.ieee.org
Modern general-purpose graphic processing units (GPGPUs) have emerged as pervasive
alternatives for parallel high-performance computing. The extreme multithreading in modern …

IBOM: An integrated and balanced on-chip memory for high performance GPGPUs

J Wang, Q Wang, L Jiang, C Li… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
GPGPU accelerated computing has revolutionized a broad range of applications. To serve
between the ever-growing computing capability and external memory, the on-chip memory …

FRF: Toward warp-scheduler friendly STT-RAM/SRAM fine-grained hybrid GPGPU register file design

Q Deng, Y Zhang, Z Zhao, S Zhang… - … on Computer-Aided …, 2019 - ieeexplore.ieee.org
Modern graphics processing units (GPUs) exhibit increasing demands for register files (RFs)
with larger capacity and bank sizes, which jeopardize the traditional SRAM-based RF …

Decoupling the multi-rate dataflow execution in coarse-grained reconfigurable array

T Hong, N Guan, C Yin, Q Wang, J Jiang… - … on Circuits and …, 2020 - ieeexplore.ieee.org
Coarse-grained reconfigurable array (CGRA) driven by dataflow execution is gaining
reviving interest as an accelerator architecture of higher energy efficiency. However, with …