DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems
Approximate computing has gained research attention recently as a way to increase energy
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …
Transparent offloading and map** (TOM) enabling programmer-transparent near-data processing in GPU systems
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …
Compressing DMA engine: Leveraging activation sparsity for training deep neural networks
Popular deep learning frameworks require users to fine-tune their memory usage so that the
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …
Mosaic: a GPU memory manager with application-transparent support for multiple page sizes
Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …
memory and demand paging. These features simplify GPU programming by providing a …
What your DRAM power models are not telling you: Lessons from a detailed experimental study
Main memory (DRAM) consumes as much as half of the total system power in a computer
today, due to the increasing demand for memory capacity and bandwidth. There is a …
today, due to the increasing demand for memory capacity and bandwidth. There is a …
A framework for memory oversubscription management in graphics processing units
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …
management of data movement between CPU memory and GPU memory dramatically …
Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency
Graphics Processing Units (GPUs) exploit large amounts of threadlevel parallelism to
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …
A survey on pcm lifetime enhancement schemes
Phase Change Memory (PCM) is an emerging memory technology that has the capability to
address the growing demand for memory capacity and bridge the gap between the main …
address the growing demand for memory capacity and bridge the gap between the main …
Buddy compression: Enabling larger memory for deep learning and hpc workloads on gpus
GPUs accelerate high-throughput applications, which require orders-of-magnitude higher
memory bandwidth than traditional CPU-only systems. However, the capacity of such high …
memory bandwidth than traditional CPU-only systems. However, the capacity of such high …