Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y **e - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

An overview of neural network compression

JO Neill - arxiv preprint arxiv:2006.03669, 2020 - arxiv.org
Overparameterized networks trained to convergence have shown impressive performance
in domains such as computer vision and natural language processing. Pushing state of the …

Pruning and quantization for deep neural network acceleration: A survey

T Liang, J Glossner, L Wang, S Shi, X Zhang - Neurocomputing, 2021 - Elsevier
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …

Training deep neural networks with 8-bit floating point numbers

N Wang, J Choi, D Brand, CY Chen… - Advances in neural …, 2018 - proceedings.neurips.cc
The state-of-the-art hardware platforms for training deep neural networks are moving from
traditional single precision (32-bit) computations towards 16 bits of precision-in large part …

A study of BFLOAT16 for deep learning training

D Kalamkar, D Mudigere, N Mellempudi, D Das… - arxiv preprint arxiv …, 2019 - arxiv.org
This paper presents the first comprehensive empirical study demonstrating the efficacy of the
Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across …

Floatpim: In-memory acceleration of deep neural network training with high precision

M Imani, S Gupta, Y Kim, T Rosing - Proceedings of the 46th International …, 2019 - dl.acm.org
Processing In-Memory (PIM) has shown a great potential to accelerate inference tasks of
Convolutional Neural Network (CNN). However, existing PIM architectures do not support …

Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors

CN Coelho, A Kuusela, S Li, H Zhuang… - Nature Machine …, 2021 - nature.com
Although the quest for more accurate solutions is pushing deep learning research towards
larger and more complex algorithms, edge devices demand efficient inference and therefore …

Towards unified int8 training for convolutional neural network

F Zhu, R Gong, F Yu, X Liu, Y Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Abstract Recently low-bit (eg, 8-bit) network quantization has been extensively studied to
accelerate the inference. Besides inference, low-bit training with quantized gradients can …

A Neural-Network-Based Model Predictive Control of Three-Phase Inverter With an Output Filter

IS Mohamed, S Rovetta, TD Do, T Dragicević… - IEEE …, 2019 - ieeexplore.ieee.org
Model predictive control (MPC) has become one of the well-established modern control
methods for three-phase inverters with an output LC filter, where a high-quality voltage with …

Use of neural networks for stable, accurate and physically consistent parameterization of subgrid atmospheric processes with good performance at reduced precision

J Yuval, PA O'Gorman, CN Hill - Geophysical Research Letters, 2021 - Wiley Online Library
A promising approach to improve climate‐model simulations is to replace traditional subgrid
parameterizations based on simplified physical models by machine learning algorithms that …