[HTML][HTML] Advances in the neural network quantization: A comprehensive review
L Wei, Z Ma, C Yang, Q Yao - Applied Sciences, 2024 - mdpi.com
Artificial intelligence technologies based on deep convolutional neural networks and large
language models have made significant breakthroughs in many tasks, such as image …
language models have made significant breakthroughs in many tasks, such as image …
Advances in neural architecture search
Automated machine learning (AutoML) has achieved remarkable success in automating the
non-trivial process of designing machine learning models. Among the focal areas of AutoML …
non-trivial process of designing machine learning models. Among the focal areas of AutoML …
Retraining-free model quantization via one-shot weight-coupling learning
Quantization is of significance for compressing the over-parameterized deep neural models
and deploying them on resource-limited devices. Fixed-precision quantization suffers from …
and deploying them on resource-limited devices. Fixed-precision quantization suffers from …
Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision
Despite the outstanding performance of transformers in both language and vision tasks, the
expanding computation and model size have increased the demand for efficient …
expanding computation and model size have increased the demand for efficient …
Hessian-based mixed-precision quantization with transition aware training for neural networks
Z Huang, X Han, Z Yu, Y Zhao, M Hou, S Hu - Neural Networks, 2025 - Elsevier
Abstract Model quantization is widely used to realize the promise of ubiquitous embedded
deep network inference. While mixed-precision quantization has shown promising …
deep network inference. While mixed-precision quantization has shown promising …
TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models
Diffusion models have emerged as preeminent contenders in the realm of generative
models. Distinguished by their distinctive sequential generative processes, characterized by …
models. Distinguished by their distinctive sequential generative processes, characterized by …
Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers
Network quantization, which strives to reduce the precision of model parameters and/or
features, is one of the most efficient ways to accelerate model inference and reduce memory …
features, is one of the most efficient ways to accelerate model inference and reduce memory …
Investigating the Impact of Quantization on Adversarial Robustness
Quantization is a promising technique for reducing the bit-width of deep models to improve
their runtime performance and storage efficiency, and thus becomes a fundamental step for …
their runtime performance and storage efficiency, and thus becomes a fundamental step for …
Mixed-Precision Embeddings for Large-Scale Recommendation Models
Embedding techniques have become essential components of large databases in the deep
learning era. By encoding discrete entities, such as words, items, or graph nodes, into …
learning era. By encoding discrete entities, such as words, items, or graph nodes, into …
Accelerating CNN Inference with an Adaptive Quantization Method Using Computational Complexity-Aware Regularization
K Nakata, D Miyashita, J Deguchi… - IEICE Transactions on …, 2024 - jstage.jst.go.jp
Quantization is commonly used to reduce the inference time of convolutional neural
networks (CNNs). To reduce the inference time without drastically reducing accuracy …
networks (CNNs). To reduce the inference time without drastically reducing accuracy …