Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product
Emerging applications such as deep neural network demand high off-chip memory
bandwidth. However, under stringent physical constraints of chip packages and system …
bandwidth. However, under stringent physical constraints of chip packages and system …
A modern primer on processing in memory
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …
design choice goes directly against at least three key trends in computing that cause …
Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …
fundamentally memory-bound. For such workloads, the data movement between main …
Drisa: A dram-based reconfigurable in-situ accelerator
Data movement between the processing units and the memory in traditional von Neumann
architecture is creating the" memory wall" problem. To bridge the gap, two approaches, the …
architecture is creating the" memory wall" problem. To bridge the gap, two approaches, the …
Processing data where it makes sense: Enabling in-memory computation
Today's systems are overwhelmingly designed to move data to computation. This design
choice goes directly against at least three key trends in systems that cause performance …
choice goes directly against at least three key trends in systems that cause performance …
Rowhammer: A retrospective
This retrospective paper describes the RowHammer problem in dynamic random access
memory (DRAM), which was initially introduced by Kim et al. at the ISCA 2014 Conference …
memory (DRAM), which was initially introduced by Kim et al. at the ISCA 2014 Conference …
Recnmp: Accelerating personalized recommendation with near-memory processing
Personalized recommendation systems leverage deep learning models and account for the
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …
Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning
Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-
intensive deep learning (DL) algorithm being deployed in today's datacenters. This paper …
intensive deep learning (DL) algorithm being deployed in today's datacenters. This paper …
Processing-in-memory: A workload-driven perspective
Many modern and emerging applications must process increasingly large volumes of data.
Unfortunately, prevalent computing paradigms are not designed to efficiently handle such …
Unfortunately, prevalent computing paradigms are not designed to efficiently handle such …
DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
ing performance, scalability, and energy efficiency in modern systems. Computer systems …