A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

In-memory computing with emerging nonvolatile memory devices

C Cheng, PJ Tiw, Y Cai, X Yan, Y Yang… - Science China Information …, 2021 - Springer
The von Neumann bottleneck and memory wall have posed fundamental limitations in
latency and energy consumption of modern computers based on von Neumann architecture …

A modern primer on processing in memory

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022 - Springer
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology

V Seshadri, D Lee, T Mullins, H Hassan… - Proceedings of the 50th …, 2017 - dl.acm.org
Many important applications trigger bulk bitwise operations, ie, bitwise operations on large
bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to …

{LegoOS}: A disseminated, distributed {OS} for hardware resource disaggregation

Y Shan, Y Huang, Y Chen, Y Zhang - 13th USENIX Symposium on …, 2018 - usenix.org
The monolithic server model where a server is the unit of deployment, operation, and failure
is meeting its limits in the face of several recent hardware and application trends. To improve …

Neural cache: Bit-serial in-cache acceleration of deep neural networks

C Eckert, X Wang, J Wang… - 2018 ACM/IEEE …, 2018 - ieeexplore.ieee.org
This paper presents the Neural Cache architecture, which re-purposes cache structures to
transform them into massively parallel compute units capable of running inferences for Deep …

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system

J Gómez-Luna, I El Hajj, I Fernandez… - IEEE …, 2022 - ieeexplore.ieee.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

Drisa: A dram-based reconfigurable in-situ accelerator

S Li, D Niu, KT Malladi, H Zheng, B Brennan… - Proceedings of the 50th …, 2017 - dl.acm.org
Data movement between the processing units and the memory in traditional von Neumann
architecture is creating the" memory wall" problem. To bridge the gap, two approaches, the …

Google workloads for consumer devices: Mitigating data movement bottlenecks

A Boroumand, S Ghose, Y Kim… - Proceedings of the …, 2018 - dl.acm.org
We are experiencing an explosive growth in the number of consumer devices, including
smartphones, tablets, web-based computers such as Chromebooks, and wearable devices …

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

X Zou, S Xu, X Chen, L Yan, Y Han - Science China Information Sciences, 2021 - Springer
The “memory wall” problem or so-called von Neumann bottleneck limits the efficiency of
conventional computer architectures, which move data from memory to CPU for …