DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems

F Betzel, K Khatamifard, H Suresh, DJ Lilja… - ACM Computing …, 2018 - dl.acm.org
Approximate computing has gained research attention recently as a way to increase energy
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

RFVP: Rollback-free value prediction with safe-to-approximate loads

A Yazdanbakhsh, G Pekhimenko, B Thwaites… - ACM Transactions on …, 2016 - dl.acm.org
This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth
(bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach …

Reducing DRAM latency at low cost by exploiting heterogeneity

D Lee - arxiv preprint arxiv:1604.08041, 2016 - arxiv.org
In modern systems, DRAM-based main memory is significantly slower than the processor.
Consequently, processors spend a long time waiting to access data from main memory …

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

Cross-core Data Sharing for Energy-efficient GPUs

H Falahati, M Sadrosadati, Q Xu… - ACM Transactions on …, 2024 - dl.acm.org
Graphics Processing Units (GPUs) are the accelerator of choice in a variety of application
domains, because they can accelerate massively parallel workloads and can be easily …

Measuring and modeling on-chip interconnect power on real hardware

V Adhinarayanan, I Paul, JL Greathouse… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
On-chip data movement is a major source of power consumption in modern processors, and
future technology nodes will exacerbate this problem. Properly understanding the power that …

Latte-cc: Latency tolerance aware adaptive cache compression management for energy efficient gpus

A Arunkumar, SY Lee… - … Symposium on High …, 2018 - ieeexplore.ieee.org
General-purpose GPU applications are significantly constrained by the efficiency of the
memory subsystem and the availability of data cache capacity on GPUs. Cache …

RowClone: Accelerating data movement and initialization using DRAM

V Seshadri, Y Kim, C Fallin, D Lee… - arxiv preprint arxiv …, 2018 - arxiv.org
In existing systems, to perform any bulk data movement operation (copy or initialization), the
data has to first be read into the on-chip processor, all the way into the L1 cache, and the …