Outlier suppression: Pushing the limit of low-bit transformer language models
Transformer architecture has become the fundamental element of the widespread natural
language processing~(NLP) models. With the trends of large NLP models, the increasing …
language processing~(NLP) models. With the trends of large NLP models, the increasing …
Outlier suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Qdrop: Randomly drop** quantization for extremely low-bit post-training quantization
Recently, post-training quantization (PTQ) has driven much attention to produce efficient
neural networks without long-time retraining. Despite its low cost, current PTQ works tend to …
neural networks without long-time retraining. Despite its low cost, current PTQ works tend to …
Adaptive data-free quantization
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without the
original data, but generates the fake sample via a generator (G) by learning from full …
original data, but generates the fake sample via a generator (G) by learning from full …
Intraq: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization
Learning to synthesize data has emerged as a promising direction in zero-shot quantization
(ZSQ), which represents neural networks by low-bit integer without accessing any of the real …
(ZSQ), which represents neural networks by low-bit integer without accessing any of the real …
Bibench: Benchmarking and analyzing network binarization
Network binarization emerges as one of the most promising compression approaches
offering extraordinary computation and memory savings by minimizing the bit-width …
offering extraordinary computation and memory savings by minimizing the bit-width …