A full spectrum of computing-in-memory technologies

Z Sun, S Kvatinsky, X Si, A Mehonic, Y Cai… - Nature Electronics, 2023 - nature.com
Computing in memory (CIM) could be used to overcome the von Neumann bottleneck and to
provide sustainable improvements in computing throughput and energy efficiency …

[HTML][HTML] A Comprehensive Review of Processing-in-Memory Architectures for Deep Neural Networks

R Kaur, A Asad, F Mohammadi - Computers, 2024 - mdpi.com
This comprehensive review explores the advancements in processing-in-memory (PIM)
techniques and chiplet-based architectures for deep neural networks (DNNs). It addresses …

Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology

B Hyun, T Kim, D Lee, M Rhu - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Processing-in-memory (PIM) has been explored for decades by computer architects, yet it
has never seen the light of day in real-world products due to its high design overheads and …

Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis

İE Yüksel, YC Tuğrul, A Olgun… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Processing-using-DRAM (PuD) is an emerging paradigm that leverages the analog
operational properties of DRAM circuitry to enable massively parallel in-DRAM computation …

pluto: Enabling massively parallel computation in dram via lookup tables

JD Ferreira, G Falcao, J Gómez-Luna… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Data movement between the main memory and the processor is a key contributor to
execution time and energy consumption in memory-intensive applications. This data …

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data …

GF Oliveira, A Olgun, AG Yağlıkçı… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a
DRAM array's massive internal parallelism to execute very-wide (eg, 16,384-262,144-bit …

Near-optimal wafer-scale reduce

P Luczynski, L Gianinazzi, P Iff, L Wilson… - Proceedings of the 33rd …, 2024 - dl.acm.org
Efficient Reduce and AllReduce communication collectives are a critical cornerstone of high-
performance computing (HPC) applications. We present the first systematic investigation of …

Neupims: Npu-pim heterogeneous acceleration for batched llm inferencing

G Heo, S Lee, J Cho, H Choi, S Lee, H Ham… - Proceedings of the 29th …, 2024 - dl.acm.org
Modern transformer-based Large Language Models (LLMs) are constructed with a series of
decoder blocks. Each block comprises three key components:(1) QKV generation,(2) multi …

Evaluating Homomorphic Operations on a Real-World Processing-In-Memory System

H Gupta, M Kabra, J Gómez-Luna… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
Computing on encrypted data is a promising approach to reduce data security and privacy
risks, with homomorphic encryption serving as a facilitator in achieving this goal. In this work …

Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis

İE Yüksel, YC Tuğrul, FN Bostancı… - 2024 54th Annual …, 2024 - ieeexplore.ieee.org
We experimentally analyze the computational capability of commercial off-the-shelf (COTS)
DRAM chips and the robustness of these capabilities under various timing delays between …