A modern primer on processing in memory
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …
design choice goes directly against at least three key trends in computing that cause …
Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …
fundamentally memory-bound. For such workloads, the data movement between main …
Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …
fundamentally memory-bound. For such workloads, the data movement between main …
CLR-DRAM: A low-cost DRAM architecture enabling dynamic capacity-latency trade-off
DRAM is the prevalent main memory technology, but its long access latency can limit the
performance of many workloads. Although prior works provide DRAM designs that reduce …
performance of many workloads. Although prior works provide DRAM designs that reduce …
Exploiting page table locality for agile tlb prefetching
Frequent Translation Lookaside Buffer (TLB) misses incur high performance and energy
costs due to page walks required for fetching the corresponding address translations …
costs due to page walks required for fetching the corresponding address translations …
A survey of memory-centric energy efficient computer architecture
C Zhang, H Sun, S Li, Y Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Energy efficient architecture is essential to improve both the performance and power
consumption of a computer system. However, modern computers suffer from the severe …
consumption of a computer system. However, modern computers suffer from the severe …
Rebooting virtual memory with midgard
Computer systems designers are building cache hierarchies with higher capacity to capture
the ever-increasing working sets of modern workloads. Cache hierarchies with higher …
the ever-increasing working sets of modern workloads. Cache hierarchies with higher …
Parallel virtualized memory translation with nested elastic cuckoo page tables
A major reason why nested or virtualized address translations are slow is because current
systems organize page tables in a multi-level tree that is accessed in a sequential manner. A …
systems organize page tables in a multi-level tree that is accessed in a sequential manner. A …
Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources
Address translation is a performance bottleneck in data-intensive workloads due to large
datasets and irregular access patterns that lead to frequent high-latency page table walks …
datasets and irregular access patterns that lead to frequent high-latency page table walks …
Intelligent architectures for intelligent computing systems
O Mutlu - 2021 Design, Automation & Test in Europe …, 2021 - ieeexplore.ieee.org
Computing is bottlenecked by data. Large amounts of application data overwhelm storage
capability, communication capability, and computation capability of the modern machines …
capability, communication capability, and computation capability of the modern machines …