Outlier suppression: Pushing the limit of low-bit transformer language models

X Wei, Y Zhang, X Zhang, R Gong… - Advances in …, 2022 - proceedings.neurips.cc
Transformer architecture has become the fundamental element of the widespread natural
language processing~(NLP) models. With the trends of large NLP models, the increasing …

Outlier suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

X Wei, Y Zhang, Y Li, X Zhang, R Gong, J Guo… - ar** neuromorphic intelligence on event-based datasets with Spiking Neural
Networks (SNNs) has recently attracted much research attention. However, the limited size …

Qdrop: Randomly drop** quantization for extremely low-bit post-training quantization

X Wei, R Gong, Y Li, X Liu, F Yu - arxiv preprint arxiv:2203.05740, 2022 - arxiv.org
Recently, post-training quantization (PTQ) has driven much attention to produce efficient
neural networks without long-time retraining. Despite its low cost, current PTQ works tend to …

Adaptive data-free quantization

B Qian, Y Wang, R Hong… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without the
original data, but generates the fake sample via a generator (G) by learning from full …

Intraq: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization

Y Zhong, M Lin, G Nan, J Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Learning to synthesize data has emerged as a promising direction in zero-shot quantization
(ZSQ), which represents neural networks by low-bit integer without accessing any of the real …

Bibench: Benchmarking and analyzing network binarization

H Qin, M Zhang, Y Ding, A Li, Z Cai… - International …, 2023 - proceedings.mlr.press
Network binarization emerges as one of the most promising compression approaches
offering extraordinary computation and memory savings by minimizing the bit-width …