[HTML][HTML] Advances in the neural network quantization: A comprehensive review

L Wei, Z Ma, C Yang, Q Yao - Applied Sciences, 2024 - mdpi.com
Artificial intelligence technologies based on deep convolutional neural networks and large
language models have made significant breakthroughs in many tasks, such as image …

Advances in neural architecture search

X Wang, W Zhu - National Science Review, 2024 - academic.oup.com
Automated machine learning (AutoML) has achieved remarkable success in automating the
non-trivial process of designing machine learning models. Among the focal areas of AutoML …

Retraining-free model quantization via one-shot weight-coupling learning

C Tang, Y Meng, J Jiang, S **e, R Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Quantization is of significance for compressing the over-parameterized deep neural models
and deploying them on resource-limited devices. Fixed-precision quantization suffers from …

Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision

X Huang, Z Shen, P Dong, KT Cheng - Transactions on Machine …, 2024 - openreview.net
Despite the outstanding performance of transformers in both language and vision tasks, the
expanding computation and model size have increased the demand for efficient …

Hessian-based mixed-precision quantization with transition aware training for neural networks

Z Huang, X Han, Z Yu, Y Zhao, M Hou, S Hu - Neural Networks, 2025 - Elsevier
Abstract Model quantization is widely used to realize the promise of ubiquitous embedded
deep network inference. While mixed-precision quantization has shown promising …

TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models

H Sun, C Tang, Z Wang, Y Meng, X Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have emerged as preeminent contenders in the realm of generative
models. Distinguished by their distinctive sequential generative processes, characterized by …

Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers

X Zhou, Y Duan, R Ding, Q Wang, Q Wang, J Qin, H Liu - Electronics, 2023 - mdpi.com
Network quantization, which strives to reduce the precision of model parameters and/or
features, is one of the most efficient ways to accelerate model inference and reduce memory …

Investigating the Impact of Quantization on Adversarial Robustness

Q Li, Y Meng, C Tang, J Jiang, Z Wang - arxiv preprint arxiv:2404.05639, 2024 - arxiv.org
Quantization is a promising technique for reducing the bit-width of deep models to improve
their runtime performance and storage efficiency, and thus becomes a fundamental step for …

Mixed-Precision Embeddings for Large-Scale Recommendation Models

S Li, Z Hu, X Tang, H Wang, S Xu, W Luo, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Embedding techniques have become essential components of large databases in the deep
learning era. By encoding discrete entities, such as words, items, or graph nodes, into …

Accelerating CNN Inference with an Adaptive Quantization Method Using Computational Complexity-Aware Regularization

K Nakata, D Miyashita, J Deguchi… - IEICE Transactions on …, 2024 - jstage.jst.go.jp
Quantization is commonly used to reduce the inference time of convolutional neural
networks (CNNs). To reduce the inference time without drastically reducing accuracy …