Q-diffusion: Quantizing diffusion models

X Li, Y Liu, L Lian, H Yang, Z Dong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Diffusion models have achieved great success in image synthesis through iterative noise
estimation using deep neural networks. However, the slow inference, high memory …

Outlier suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

X Wei, Y Zhang, Y Li, X Zhang, R Gong, J Guo… - arxiv preprint arxiv …, 2023 - arxiv.org
Post-training quantization~(PTQ) of transformer language models faces significant
challenges due to the existence of detrimental outliers in activations. We observe that these …

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Intraq: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization

Y Zhong, M Lin, G Nan, J Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Learning to synthesize data has emerged as a promising direction in zero-shot quantization
(ZSQ), which represents neural networks by low-bit integer without accessing any of the real …

Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers

Y Liu, H Yang, Z Dong, K Keutzer… - Proceedings of the …, 2023 - openaccess.thecvf.com
The complicated architecture and high training cost of vision transformers urge the
exploration of post-training quantization. However, the heavy-tailed distribution of vision …

Hard sample matters a lot in zero-shot quantization

H Li, X Wu, F Lv, D Liao, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural
networks when the data for training full-precision models are inaccessible. In ZSQ, network …

Diverse sample generation: Pushing the limit of generative data-free quantization

H Qin, Y Ding, X Zhang, J Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Generative data-free quantization emerges as a practical compression approach that
quantizes deep neural networks to low bit-width without accessing the real data. This …

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

W Cui, H Zhao, Q Chen, H Wei, Z Li, D Zeng… - 2022 USENIX Annual …, 2022 - usenix.org
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

C Guo, F Cheng, Z Du, J Kiessling, J Ku… - IEEE Circuits and …, 2025 - ieeexplore.ieee.org
The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …

Sana: Efficient high-resolution image synthesis with linear diffusion transformers

E **e, J Chen, J Chen, H Cai, H Tang, Y Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Sana, a text-to-image framework that can efficiently generate images up to
4096$\times $4096 resolution. Sana can synthesize high-resolution, high-quality images …