Accelerating CNN inference on ASICs: A survey

D Moolchandani, A Kumar, SR Sarangi - Journal of Systems Architecture, 2021 - Elsevier
Convolutional neural networks (CNNs) have proven to be a disruptive technology in most
vision, speech and image processing tasks. Given their ubiquitous acceptance, the research …

Cambricon: An instruction set architecture for neural networks

S Liu, Z Du, J Tao, D Han, T Luo, Y **e… - ACM SIGARCH …, 2016 - dl.acm.org
Neural Networks (NN) are a family of models for a broad range of emerging machine
learning and pattern recondition applications. NN techniques are conventionally executed …

Origami: A 803-GOp/s/W convolutional network accelerator

L Cavigelli, L Benini - … Transactions on Circuits and Systems for …, 2016 - ieeexplore.ieee.org
An ever-increasing number of computer vision and image/video processing challenges are
being approached using deep convolutional neural networks, obtaining state-of-the-art …

14.6 a 1.42 tops/w deep convolutional neural network recognition processor for intelligent ioe systems

J Sim, JS Park, M Kim, D Bae, Y Choi… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
In this paper, we present an energy-efficient CNN processor with 4 key features:(1) a CNN-
optimized neuron processing engine (NPE),(2) a dual-range multiplyaccumulate (DRMAC) …

14.1 A 2.9 TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems

G Desoli, N Chawla, T Boesch, S Singh… - … Solid-State Circuits …, 2017 - ieeexplore.ieee.org
A booming number of computer vision, speech recognition, and signal processing
applications, are increasingly benefiting from the use of deep convolutional neural networks …

Neurostream: Scalable and energy efficient deep learning with smart memory cubes

E Azarkhish, D Rossi, I Loi… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
High-performance computing systems are moving towards 2.5 D and 3D memory
hierarchies, based on High Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) to …

Data and hardware efficient design for convolutional neural network

YJ Lin, TS Chang - IEEE Transactions on Circuits and Systems I …, 2017 - ieeexplore.ieee.org
Hardware design of deep convolutional neural networks (CNNs) faces challenges of high
computational complexity and data bandwidth as well as huge divergence in different CNN …

Data-optimized neural network traversal

JW Brothers, J Lee - US Patent 10,417,555, 2019 - Google Patents
Executing a neural network includes generating an output tile of a first layer of the neural
network by processing an input tile to the first layer and storing the output tile of the first layer …

HERO: Heterogeneous embedded research platform for exploring RISC-V manycore accelerators on FPGA

A Kurth, P Vogel, A Capotondi, A Marongiu… - arxiv preprint arxiv …, 2017 - arxiv.org
Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host
processor with programmable manycore accelerators (PMCAs) to combine general-purpose …

VWA: Hardware efficient vectorwise accelerator for convolutional neural network

KW Chang, TS Chang - … Transactions on Circuits and Systems I …, 2019 - ieeexplore.ieee.org
Hardware accelerators for convolution neural networks (CNNs) enable real-time
applications of artificial intelligence technology. However, most of the existing designs suffer …