Q-diffusion: Quantizing diffusion models

X Li, Y Liu, L Lian, H Yang, Z Dong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Diffusion models have achieved great success in image synthesis through iterative noise
estimation using deep neural networks. However, the slow inference, high memory …

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Outlier suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

X Wei, Y Zhang, Y Li, X Zhang, R Gong, J Guo… - arxiv preprint arxiv …, 2023 - arxiv.org
Post-training quantization~(PTQ) of transformer language models faces significant
challenges due to the existence of detrimental outliers in activations. We observe that these …

Compressing large language models by joint sparsification and quantization

J Guo, J Wu, Z Wang, J Liu, G Yang, Y Ding… - … on Machine Learning, 2024 - openreview.net
In this paper, we introduce a novel model compression technique named Joint Sparsification
and Quantization (JSQ), explicitly tailored for large language models (LLMs). Traditional …

Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers

Y Liu, H Yang, Z Dong, K Keutzer… - Proceedings of the …, 2023 - openaccess.thecvf.com
The complicated architecture and high training cost of vision transformers urge the
exploration of post-training quantization. However, the heavy-tailed distribution of vision …

Intraq: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization

Y Zhong, M Lin, G Nan, J Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Learning to synthesize data has emerged as a promising direction in zero-shot quantization
(ZSQ), which represents neural networks by low-bit integer without accessing any of the real …

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

Hard sample matters a lot in zero-shot quantization

H Li, X Wu, F Lv, D Liao, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural
networks when the data for training full-precision models are inaccessible. In ZSQ, network …

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

W Cui, H Zhao, Q Chen, H Wei, Z Li, D Zeng… - 2022 USENIX Annual …, 2022 - usenix.org
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …

Outlier suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling

X Wei, Y Zhang, Y Li, X Zhang, R Gong… - Proceedings of the …, 2023 - aclanthology.org
Post-training quantization (PTQ) of transformer language models faces significant
challenges due to the existence of detrimental outliers in activations. We observe that these …