DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
Understanding reduced-voltage operation in modern DRAM devices: Experimental characterization, analysis, and mechanisms
The energy consumption of DRAM is a critical concern in modern computing systems.
Improvements in manufacturing process technology have allowed DRAM vendors to lower …
Improvements in manufacturing process technology have allowed DRAM vendors to lower …
Memory scaling: A systems architecture perspective
O Mutlu - 2013 5th IEEE International Memory Workshop, 2013 - ieeexplore.ieee.org
The memory system is a fundamental performance and energy bottleneck in almost all
computing systems. Recent system design, application, and technology trends that require …
computing systems. Recent system design, application, and technology trends that require …
[PDF][PDF] Research problems and opportunities in memory systems
The memory system is a fundamental performance and energy bottleneck in almost all
computing systems. Recent system design, application, and technology trends that require …
computing systems. Recent system design, application, and technology trends that require …
Enabling interposer-based disintegration of multi-core processors
Silicon interposers enable the integration of multiple stacks of in-package memory to provide
higher bandwidth or lower energy for memory accesses. Once the interposer has been paid …
higher bandwidth or lower energy for memory accesses. Once the interposer has been paid …
Mosaic: a GPU memory manager with application-transparent support for multiple page sizes
Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …
memory and demand paging. These features simplify GPU programming by providing a …
Memscale: active low-power modes for main memory
Main memory is responsible for a large and increasing fraction of the energy consumed by
servers. Prior work has focused on exploiting DRAM low-power states to conserve energy …
servers. Prior work has focused on exploiting DRAM low-power states to conserve energy …
Load value approximation
Approximate computing explores opportunities that emerge when applications can tolerate
error or inexactness. These applications, which range from multimedia processing to …
error or inexactness. These applications, which range from multimedia processing to …
CHIPPER: A low-complexity bufferless deflection router
As Chip Multiprocessors (CMPs) scale to tens or hundreds of nodes, the interconnect
becomes a significant factor in cost, energy consumption and performance. Recent work has …
becomes a significant factor in cost, energy consumption and performance. Recent work has …
Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency
Graphics Processing Units (GPUs) exploit large amounts of threadlevel parallelism to
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …