[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives

B Peccerillo, M Mannino, A Mondelli… - Journal of Systems …, 2022 - Elsevier
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …

A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications

L Liu, J Zhu, Z Li, Y Lu, Y Deng, J Han, S Yin… - ACM Computing …, 2019 - dl.acm.org
As general-purpose processors have hit the power wall and chip fabrication cost escalates
alarmingly, coarse-grained reconfigurable architectures (CGRAs) are attracting increasing …

Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product

S Lee, S Kang, J Lee, H Kim, E Lee… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Emerging applications such as deep neural network demand high off-chip memory
bandwidth. However, under stringent physical constraints of chip packages and system …

A modern primer on processing in memory

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022 - Springer
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology

V Seshadri, D Lee, T Mullins, H Hassan… - Proceedings of the 50th …, 2017 - dl.acm.org
Many important applications trigger bulk bitwise operations, ie, bitwise operations on large
bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to …

Pipelayer: A pipelined reram-based accelerator for deep learning

L Song, X Qian, H Li, Y Chen - 2017 IEEE international …, 2017 - ieeexplore.ieee.org
Convolution neural networks (CNNs) are the heart of deep learning applications. Recent
works PRIME [1] and ISAAC [2] demonstrated the promise of using resistive random access …

{LegoOS}: A disseminated, distributed {OS} for hardware resource disaggregation

Y Shan, Y Huang, Y Chen, Y Zhang - 13th USENIX Symposium on …, 2018 - usenix.org
The monolithic server model where a server is the unit of deployment, operation, and failure
is meeting its limits in the face of several recent hardware and application trends. To improve …

Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory

P Chi, S Li, C Xu, T Zhang, J Zhao, Y Liu… - ACM SIGARCH …, 2016 - dl.acm.org
Processing-in-memory (PIM) is a promising solution to address the" memory wall"
challenges for future computer systems. Prior proposed PIM architectures put additional …

Tetris: Scalable and efficient neural network acceleration with 3d memory

M Gao, J Pu, X Yang, M Horowitz… - Proceedings of the Twenty …, 2017 - dl.acm.org
The high accuracy of deep neural networks (NNs) has led to the development of NN
accelerators that improve performance by two orders of magnitude. However, scaling these …

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system

J Gómez-Luna, I El Hajj, I Fernandez… - IEEE …, 2022 - ieeexplore.ieee.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …