DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems
Approximate computing has gained research attention recently as a way to increase energy
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …
A framework for memory oversubscription management in graphics processing units
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …
management of data movement between CPU memory and GPU memory dramatically …
RFVP: Rollback-free value prediction with safe-to-approximate loads
This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth
(bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach …
(bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach …
Reducing DRAM latency at low cost by exploiting heterogeneity
D Lee - arxiv preprint arxiv:1604.08041, 2016 - arxiv.org
In modern systems, DRAM-based main memory is significantly slower than the processor.
Consequently, processors spend a long time waiting to access data from main memory …
Consequently, processors spend a long time waiting to access data from main memory …
ITAP: Idle-time-aware power management for GPU execution units
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …
applications with massively data-parallel tasks. However, recent studies show that GPUs …
Cross-core Data Sharing for Energy-efficient GPUs
Graphics Processing Units (GPUs) are the accelerator of choice in a variety of application
domains, because they can accelerate massively parallel workloads and can be easily …
domains, because they can accelerate massively parallel workloads and can be easily …
Measuring and modeling on-chip interconnect power on real hardware
On-chip data movement is a major source of power consumption in modern processors, and
future technology nodes will exacerbate this problem. Properly understanding the power that …
future technology nodes will exacerbate this problem. Properly understanding the power that …
Latte-cc: Latency tolerance aware adaptive cache compression management for energy efficient gpus
A Arunkumar, SY Lee… - … Symposium on High …, 2018 - ieeexplore.ieee.org
General-purpose GPU applications are significantly constrained by the efficiency of the
memory subsystem and the availability of data cache capacity on GPUs. Cache …
memory subsystem and the availability of data cache capacity on GPUs. Cache …
RowClone: Accelerating data movement and initialization using DRAM
In existing systems, to perform any bulk data movement operation (copy or initialization), the
data has to first be read into the on-chip processor, all the way into the L1 cache, and the …
data has to first be read into the on-chip processor, all the way into the L1 cache, and the …