- Academic Search

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org

Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Save Cite Cited by 108 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] umn.edu

Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems

F Betzel, K Khatamifard, H Suresh, DJ Lilja… - ACM Computing …, 2018 - dl.acm.org

Approximate computing has gained research attention recently as a way to increase energy
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …

Save Cite Cited by 74 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org

Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

Save Cite Cited by 99 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

RFVP: Rollback-free value prediction with safe-to-approximate loads

A Yazdanbakhsh, G Pekhimenko, B Thwaites… - ACM Transactions on …, 2016 - dl.acm.org

This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth
(bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach …

Save Cite Cited by 101 Related articles All 21 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Reducing DRAM latency at low cost by exploiting heterogeneity

D Lee - arxiv preprint arxiv:1604.08041, 2016 - arxiv.org

In modern systems, DRAM-based main memory is significantly slower than the processor.
Consequently, processors spend a long time waiting to access data from main memory …

Save Cite Cited by 63 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org Full View

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org

Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

Save Cite Cited by 34 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org Full View

Cross-core Data Sharing for Energy-efficient GPUs

H Falahati, M Sadrosadati, Q Xu… - ACM Transactions on …, 2024 - dl.acm.org

Graphics Processing Units (GPUs) are the accelerator of choice in a variety of application
domains, because they can accelerate massively parallel workloads and can be easily …

Save Cite Cited by 1 Related articles

[Free GPT-4]

[PDF] vt.edu

Measuring and modeling on-chip interconnect power on real hardware

V Adhinarayanan, I Paul, JL Greathouse… - 2016 IEEE …, 2016 - ieeexplore.ieee.org

On-chip data movement is a major source of power consumption in modern processors, and
future technology nodes will exacerbate this problem. Properly understanding the power that …

Save Cite Cited by 34 Related articles All 14 versions Free GPT-4

[Free GPT-4]

[PDF] semanticscholar.org

Latte-cc: Latency tolerance aware adaptive cache compression management for energy efficient gpus

A Arunkumar, SY Lee… - … Symposium on High …, 2018 - ieeexplore.ieee.org

General-purpose GPU applications are significantly constrained by the efficiency of the
memory subsystem and the availability of data cache capacity on GPUs. Cache …

Save Cite Cited by 25 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

RowClone: Accelerating data movement and initialization using DRAM

V Seshadri, Y Kim, C Fallin, D Lee… - arxiv preprint arxiv …, 2018 - arxiv.org

In existing systems, to perform any bulk data movement operation (copy or initialization), the
data has to first be read into the on-chip processor, all the way into the L1 cache, and the …

Save Cite Cited by 21 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Toggle-aware compression for GPUs

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems

A framework for memory oversubscription management in graphics processing units

RFVP: Rollback-free value prediction with safe-to-approximate loads

Reducing DRAM latency at low cost by exploiting heterogeneity

ITAP: Idle-time-aware power management for GPU execution units

Cross-core Data Sharing for Energy-efficient GPUs

Measuring and modeling on-chip interconnect power on real hardware

Latte-cc: Latency tolerance aware adaptive cache compression management for energy efficient gpus

RowClone: Accelerating data movement and initialization using DRAM