A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Edge-cloud polarization and collaboration: A comprehensive survey for ai

J Yao, S Zhang, Y Yao, F Wang, J Ma… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Influenced by the great success of deep learning via cloud computing and the rapid
development of edge chips, research in artificial intelligence (AI) has shifted to both of the …

Computational complexity evaluation of neural network applications in signal processing

P Freire, S Srivallapanondh, A Napoli… - ar** quantization for extremely low-bit post-training quantization
X Wei, R Gong, Y Li, X Liu, F Yu - arxiv preprint arxiv:2203.05740, 2022 - arxiv.org
Recently, post-training quantization (PTQ) has driven much attention to produce efficient
neural networks without long-time retraining. Despite its low cost, current PTQ works tend to …

I-vit: Integer-only quantization for efficient vision transformer inference

Z Li, Q Gu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have achieved state-of-the-art performance on various
computer vision applications. However, these models have considerable storage and …