A survey of techniques for architecting and managing GPU register file
S Mittal - IEEE Transactions on Parallel and Distributed …, 2016 - ieeexplore.ieee.org
To support their massively-multithreaded architecture, GPUs use very large register file (RF)
which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs …
which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs …
Regless: Just-in-time operand staging for GPUs
J Kloosterman, J Beaumont, DA Jamshidi… - Proceedings of the 50th …, 2017 - dl.acm.org
The register file is one of the largest and most power-hungry structures in a Graphics
Processing Unit (GPU), because massive multithreading requires all the register state for …
Processing Unit (GPU), because massive multithreading requires all the register state for …
A reschedulable dataflow-simd execution for increased utilization in cgra cross-domain acceleration
C Yin, N **g, J Jiang, Q Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-
domain acceleration, control flow and memory accesses often degrade the processing …
domain acceleration, control flow and memory accesses often degrade the processing …
Cache-emulated register file: An integrated on-chip memory architecture for high performance GPGPUs
The on-chip memory design is critical to the GPGPU performance because it serves
between the massive threads and the huge external memory as a low-latency and high …
between the massive threads and the huge external memory as a low-latency and high …
Particle swarm optimization protocol for clustering in wireless sensor networks: A realistic approach
In Wireless Sensor Network (WSN), Clustering sensor nodes is an efficient topology control
method to reduce energy consumption of the sensor nodes. Many link quality-based …
method to reduce energy consumption of the sensor nodes. Many link quality-based …
Towards warp-scheduler friendly STT-RAM/SRAM hybrid GPGPU register file design
Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF)
to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power …
to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power …
Bank stealing for a compact and efficient register file architecture in GPGPU
Modern general-purpose graphic processing units (GPGPUs) have emerged as pervasive
alternatives for parallel high-performance computing. The extreme multithreading in modern …
alternatives for parallel high-performance computing. The extreme multithreading in modern …
IBOM: An integrated and balanced on-chip memory for high performance GPGPUs
GPGPU accelerated computing has revolutionized a broad range of applications. To serve
between the ever-growing computing capability and external memory, the on-chip memory …
between the ever-growing computing capability and external memory, the on-chip memory …
FRF: Toward warp-scheduler friendly STT-RAM/SRAM fine-grained hybrid GPGPU register file design
Modern graphics processing units (GPUs) exhibit increasing demands for register files (RFs)
with larger capacity and bank sizes, which jeopardize the traditional SRAM-based RF …
with larger capacity and bank sizes, which jeopardize the traditional SRAM-based RF …
Decoupling the multi-rate dataflow execution in coarse-grained reconfigurable array
T Hong, N Guan, C Yin, Q Wang, J Jiang… - … on Circuits and …, 2020 - ieeexplore.ieee.org
Coarse-grained reconfigurable array (CGRA) driven by dataflow execution is gaining
reviving interest as an accelerator architecture of higher energy efficiency. However, with …
reviving interest as an accelerator architecture of higher energy efficiency. However, with …