[HTML][HTML] Survey of deep learning accelerators for edge and emerging computing

S Alam, C Yakopcic, Q Wu, M Barnell, S Khan… - Electronics, 2024 - mdpi.com
The unprecedented progress in artificial intelligence (AI), particularly in deep learning
algorithms with ubiquitous internet connected smart devices, has created a high demand for …

RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!

T Andrulis, JS Emer, V Sze - … of the 50th Annual International Symposium …, 2023 - dl.acm.org
Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural
Network (DNN) inference by reducing costly data movement and by using resistive RAM …

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

On the accuracy of analog neural network inference accelerators

TP **ao, B Feinberg, CH Bennett… - IEEE Circuits and …, 2022 - ieeexplore.ieee.org
Specialized accelerators have recently garnered attention as a method to reduce the power
consumption of neural network inference. A promising category of accelerators utilizes …

Sparse attention acceleration with synergistic in-memory pruning and on-chip recomputation

A Yazdanbakhsh, A Moradifirouzabadi… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
As its core computation, a self-attention mechanism gauges pairwise correlations across the
entire input sequence. Despite favorable performance, calculating pairwise correlations is …

Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, J Gómez-Luna… - ACM SIGMETRICS …, 2022 - dl.acm.org
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

CiMLoop: A flexible, accurate, and fast compute-in-memory modeling tool

T Andrulis, JS Emer, V Sze - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Compute-In-Memory (CiM) is a promising solution to accelerate Deep Neural Networks
(DNNs) as it can avoid energy-intensive DNN weight movement and use memory arrays to …

Inca: Input-stationary dataflow at outside-the-box thinking about deep learning accelerators

B Kim, S Li, H Li - 2023 IEEE International Symposium on High …, 2023 - ieeexplore.ieee.org
This paper first presents an input-stationary (IS) implemented crossbar accelerator (INCA),
supporting inference and training for deep neural networks (DNNs). Processing-in-memory …

The landscape of compute-near-memory and compute-in-memory: A research and commercial overview

AA Khan, JPC De Lima, H Farzaneh… - arxiv preprint arxiv …, 2024 - arxiv.org
In today's data-centric world, where data fuels numerous application domains, with machine
learning at the forefront, handling the enormous volume of data efficiently in terms of time …

Tandem processor: Grappling with emerging operators in neural networks

S Ghodrati, S Kinzer, H Xu, R Mahapatra… - Proceedings of the 29th …, 2024 - dl.acm.org
With the ever increasing prevalence of neural networks and the upheaval from the language
models, it is time to rethink neural acceleration. Up to this point, the broader research …