A survey on efficient convolutional neural networks and hardware acceleration
Over the past decade, deep-learning-based representations have demonstrated remarkable
performance in academia and industry. The learning capability of convolutional neural …
performance in academia and industry. The learning capability of convolutional neural …
Model compression and hardware acceleration for neural networks: A comprehensive survey
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …
slow down for general-purpose processors due to the foreseeable end of Moore's Law …
Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …
reduce the size of neural networks by selectively pruning components. Similarly to their …
Spatten: Efficient sparse attention architecture with cascade token and head pruning
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …
(NLP) applications, showing superior performance than convolutional and recurrent …
Efficient acceleration of deep learning inference on resource-constrained edge devices: A review
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …
in breakthroughs in many areas. However, deploying these highly accurate models for data …
Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training
The advent of Deep Learning (DL) has radically transformed the computing industry across
the entire spectrum from algorithms to circuits. As myriad application domains embrace DL, it …
the entire spectrum from algorithms to circuits. As myriad application domains embrace DL, it …
Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices
A recent trend in deep neural network (DNN) development is to extend the reach of deep
learning applications to platforms that are more resource and energy-constrained, eg …
learning applications to platforms that are more resource and energy-constrained, eg …
Amc: Automl for model compression and acceleration on mobile devices
Abstract Model compression is an effective technique to efficiently deploy neural network
models on mobile devices which have limited computation resources and tight power …
models on mobile devices which have limited computation resources and tight power …
Machine learning at facebook: Understanding inference at the edge
At Facebook, machine learning provides a wide range of capabilities that drive many
aspects of user experience including ranking posts, content understanding, object detection …
aspects of user experience including ranking posts, content understanding, object detection …
Machine learning at the network edge: A survey
Resource-constrained IoT devices, such as sensors and actuators, have become ubiquitous
in recent years. This has led to the generation of large quantities of data in real-time, which …
in recent years. This has led to the generation of large quantities of data in real-time, which …