Simba: Scaling deep-learning inference with multi-chip-module-based architecture

YS Shao, J Clemons, R Venkatesan, B Zimmer… - Proceedings of the …, 2019 - dl.acm.org
Package-level integration using multi-chip-modules (MCMs) is a promising approach for
building large-scale systems. Compared to a large monolithic die, an MCM combines many …

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

CoNDA: Efficient cache coherence support for near-data accelerators

A Boroumand, S Ghose, M Patel, H Hassan… - Proceedings of the 46th …, 2019 - dl.acm.org
Specialized on-chip accelerators are widely used to improve the energy efficiency of
computing systems. Recent advances in memory technology have enabled near-data …

Memory-Centric Computing: Recent Advances in Processing-in-DRAM

O Mutlu, A Olgun, GF Oliveira… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Memory-centric computing aims to enable computation capability in and near all places
where data is generated and stored. As such, it can greatly reduce the large negative …

Syncron: Efficient synchronization support for near-data-processing architectures

C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …

MIMDRAM: An end-to-end processing-using-DRAM system for high-throughput, energy-efficient and programmer-transparent multiple-instruction multiple-data …

GF Oliveira, A Olgun, AG Yağlıkçı… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a
DRAM array's massive internal parallelism to execute very-wide (eg, 16,384-262,144-bit …

[HTML][HTML] A survey of resource management for processing-in-memory and near-memory processing architectures

K Khan, S Pasricha, RG Kim - Journal of Low Power Electronics and …, 2020 - mdpi.com
Due to the amount of data involved in emerging deep learning and big data applications,
operations related to data movement have quickly become a bottleneck. Data-centric …

Metanmp: Leveraging cartesian-like product to accelerate hgnns with near-memory processing

D Chen, H He, H **, L Zheng, Y Huang… - Proceedings of the 50th …, 2023 - dl.acm.org
Heterogeneous graph neural networks (HGNNs) based on metapath exhibit powerful
capturing of rich structural and semantic information in the heterogeneous graph. HGNNs …

pLUTo: Enabling massively parallel computation in DRAM via lookup tables

JD Ferreira, G Falcao, J Gómez-Luna… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Data movement between the main memory and the processor is a key contributor to
execution time and energy consumption in memory-intensive applications. This data …

Casper: accelerating stencil computations using near-cache processing

A Denzler, GF Oliveira, N Ha**azar, R Bera… - IEEE …, 2023 - ieeexplore.ieee.org
Stencil computations are commonly used in a wide variety of scientific applications, ranging
from large-scale weather prediction to solving partial differential equations. Stencil …