A modern primer on processing in memory

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022 - Springer
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system

J Gómez-Luna, I El Hajj, I Fernandez… - IEEE …, 2022 - ieeexplore.ieee.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture

J Gómez-Luna, IE Hajj, I Fernandez… - arxiv preprint arxiv …, 2021 - arxiv.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

CLR-DRAM: A low-cost DRAM architecture enabling dynamic capacity-latency trade-off

H Luo, T Shahroodi, H Hassan, M Patel… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
DRAM is the prevalent main memory technology, but its long access latency can limit the
performance of many workloads. Although prior works provide DRAM designs that reduce …

Exploiting page table locality for agile tlb prefetching

G Vavouliotis, L Alvarez, V Karakostas… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Frequent Translation Lookaside Buffer (TLB) misses incur high performance and energy
costs due to page walks required for fetching the corresponding address translations …

A survey of memory-centric energy efficient computer architecture

C Zhang, H Sun, S Li, Y Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Energy efficient architecture is essential to improve both the performance and power
consumption of a computer system. However, modern computers suffer from the severe …

Rebooting virtual memory with midgard

S Gupta, A Bhattacharyya, Y Oh… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Computer systems designers are building cache hierarchies with higher capacity to capture
the ever-increasing working sets of modern workloads. Cache hierarchies with higher …

Parallel virtualized memory translation with nested elastic cuckoo page tables

J Stojkovic, D Skarlatos, A Kokolis, T Xu… - Proceedings of the 27th …, 2022 - dl.acm.org
A major reason why nested or virtualized address translations are slow is because current
systems organize page tables in a multi-level tree that is accessed in a sequential manner. A …

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

K Kanellopoulos, HC Nam, N Bostanci, R Bera… - Proceedings of the 56th …, 2023 - dl.acm.org
Address translation is a performance bottleneck in data-intensive workloads due to large
datasets and irregular access patterns that lead to frequent high-latency page table walks …

Intelligent architectures for intelligent computing systems

O Mutlu - 2021 Design, Automation & Test in Europe …, 2021 - ieeexplore.ieee.org
Computing is bottlenecked by data. Large amounts of application data overwhelm storage
capability, communication capability, and computation capability of the modern machines …