Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead
Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning
(DL) is already present in many applications ranging from computer vision for medicine to …
(DL) is already present in many applications ranging from computer vision for medicine to …
Confuciux: Autonomous hardware resource assignment for dnn accelerators using reinforcement learning
DNN accelerators provide efficiency by leveraging reuse of activations/weights/outputs
during the DNN computations to reduce data movement from DRAM to the chip. The reuse is …
during the DNN computations to reduce data movement from DRAM to the chip. The reuse is …
Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …
processing these computational-and memory-intensive applications, tensors of these …
A multi-neural network acceleration architecture
A cost-effective multi-tenant neural network execution is becoming one of the most important
design goals for modern neural network accelerators. For example, as emerging AI services …
design goals for modern neural network accelerators. For example, as emerging AI services …
Procrustes: a dataflow and accelerator for sparse deep neural network training
The success of DNN pruning has led to the development of energy-efficient inference
accelerators that support pruned models with sparse weight and activation tensors. Because …
accelerators that support pruned models with sparse weight and activation tensors. Because …
Laconic deep learning inference acceleration
We present a method for transparently identifying ineffectual computations during inference
with Deep Learning models. Specifically, by decomposing multiplications down to the bit …
with Deep Learning models. Specifically, by decomposing multiplications down to the bit …
[PDF][PDF] Gemmini: An agile systolic array generator enabling systematic evaluations of deep-learning architectures
Advances in deep learning and neural networks have resulted in rapid development of
hardware accelerators that support them. A large majority of ASIC accelerators, however …
hardware accelerators that support them. A large majority of ASIC accelerators, however …
FlexCNN: An end-to-end framework for composing CNN accelerators on FPGA
With reduced data reuse and parallelism, recent convolutional neural networks (CNNs)
create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable …
create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable …
Review and benchmarking of precision-scalable multiply-accumulate unit architectures for embedded neural-network processing
The current trend for deep learning has come with an enormous computational need for
billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced …
billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced …
Dmazerunner: Executing perfectly nested loops on dataflow accelerators
Dataflow accelerators feature simplicity, programmability, and energy-efficiency and are
visualized as a promising architecture for accelerating perfectly nested loops that dominate …
visualized as a promising architecture for accelerating perfectly nested loops that dominate …