Model compression and hardware acceleration for neural networks: A comprehensive survey
L Deng, G Li, S Han, L Shi, Y ** activation for quantized neural networks
Deep learning algorithms achieve high classification accuracy at the expense of significant
computation cost. To address this cost, a number of quantization schemes have been …
computation cost. To address this cost, a number of quantization schemes have been …
Differentiable soft quantization: Bridging full-precision and low-bit neural networks
Hardware-friendly network quantization (eg, binary/uniform quantization) can efficiently
accelerate the inference and meanwhile reduce memory consumption of the deep neural …
accelerate the inference and meanwhile reduce memory consumption of the deep neural …
Learning to quantize deep networks by optimizing quantization intervals with task loss
Reducing bit-widths of activations and weights of deep networks makes it efficient to
compute and store them in memory, which is crucial in their deployments to resource-limited …
compute and store them in memory, which is crucial in their deployments to resource-limited …
Accurate and efficient 2-bit quantized neural networks
J Choi, S Venkataramani… - Proceedings of …, 2019 - proceedings.mlsys.org
Deep learning algorithms achieve high classification accuracy at the expense of significant
computation cost. In order to reduce this cost, several quantization schemes have gained …
computation cost. In order to reduce this cost, several quantization schemes have gained …
Compression of deep learning models for text: A survey
In recent years, the fields of natural language processing (NLP) and information retrieval (IR)
have made tremendous progress thanks to deep learning models like Recurrent Neural …
have made tremendous progress thanks to deep learning models like Recurrent Neural …
Adabits: Neural network quantization with adaptive bit-widths
Deep neural networks with adaptive configurations have gained increasing attention due to
the instant and flexible deployment of these models on platforms with different resource …
the instant and flexible deployment of these models on platforms with different resource …
Energy-efficient neural network accelerator based on outlier-aware low-precision computation
Owing to the presence of large values, which we call outliers, conventional methods of
quantization fail to achieve significantly low precision, eg, four bits, for very deep neural …
quantization fail to achieve significantly low precision, eg, four bits, for very deep neural …