- Academic Search

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org

Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Save Cite Cited by 108 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] umn.edu

Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems

F Betzel, K Khatamifard, H Suresh, DJ Lilja… - ACM Computing …, 2018 - dl.acm.org

Approximate computing has gained research attention recently as a way to increase energy
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …

Save Cite Cited by 74 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] illinois.edu

Transparent offloading and map** (TOM) enabling programmer-transparent near-data processing in GPU systems

K Hsieh, E Ebrahimi, G Kim, N Chatterjee… - ACM SIGARCH …, 2016 - dl.acm.org

Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …

Save Cite Cited by 330 Related articles All 23 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Compressing DMA engine: Leveraging activation sparsity for training deep neural networks

M Rhu, M O'Connor, N Chatterjee… - … Symposium on High …, 2018 - ieeexplore.ieee.org

Popular deep learning frameworks require users to fine-tune their memory usage so that the
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …

Save Cite Cited by 242 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes

R Ausavarungnirun, J Landgraf, V Miller… - Proceedings of the 50th …, 2017 - dl.acm.org

Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …

Save Cite Cited by 161 Related articles All 26 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

What your DRAM power models are not telling you: Lessons from a detailed experimental study

S Ghose, AG Yaglikçi, R Gupta, D Lee… - Proceedings of the …, 2018 - dl.acm.org

Main memory (DRAM) consumes as much as half of the total system power in a computer
today, due to the increasing demand for memory capacity and bandwidth. There is a …

Save Cite Cited by 149 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org

Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

Save Cite Cited by 99 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency

R Ausavarungnirun, V Miller, J Landgraf… - ACM SIGPLAN …, 2018 - dl.acm.org

Graphics Processing Units (GPUs) exploit large amounts of threadlevel parallelism to
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …

Save Cite Cited by 117 Related articles All 26 versions Free GPT-4

A survey on pcm lifetime enhancement schemes

S Rashidi, M Jalili, H Sarbazi-Azad - ACM Computing Surveys (CSUR), 2019 - dl.acm.org

Phase Change Memory (PCM) is an emerging memory technology that has the capability to
address the growing demand for memory capacity and bridge the gap between the main …

Save Cite Cited by 21 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Buddy compression: Enabling larger memory for deep learning and hpc workloads on gpus

E Choukse, MB Sullivan, M O'Connor… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

GPUs accelerate high-throughput applications, which require orders-of-magnitude higher
memory bandwidth than traditional CPU-only systems. However, the capacity of such high …

Save Cite Cited by 59 Related articles All 9 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

A case for toggle-aware compression for GPU systems

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems

Transparent offloading and map** (TOM) enabling programmer-transparent near-data processing in GPU systems

Compressing DMA engine: Leveraging activation sparsity for training deep neural networks

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes

What your DRAM power models are not telling you: Lessons from a detailed experimental study

A framework for memory oversubscription management in graphics processing units

Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency

A survey on pcm lifetime enhancement schemes

Buddy compression: Enabling larger memory for deep learning and hpc workloads on gpus