Q-diffusion: Quantizing diffusion models
Diffusion models have achieved great success in image synthesis through iterative noise
estimation using deep neural networks. However, the slow inference, high memory …
estimation using deep neural networks. However, the slow inference, high memory …
Outlier suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Post-training quantization~(PTQ) of transformer language models faces significant
challenges due to the existence of detrimental outliers in activations. We observe that these …
challenges due to the existence of detrimental outliers in activations. We observe that these …
Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …
growing model size. LLMs' size grows by 240× every two years, which outpaces the …
Intraq: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization
Learning to synthesize data has emerged as a promising direction in zero-shot quantization
(ZSQ), which represents neural networks by low-bit integer without accessing any of the real …
(ZSQ), which represents neural networks by low-bit integer without accessing any of the real …
Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers
The complicated architecture and high training cost of vision transformers urge the
exploration of post-training quantization. However, the heavy-tailed distribution of vision …
exploration of post-training quantization. However, the heavy-tailed distribution of vision …
Hard sample matters a lot in zero-shot quantization
H Li, X Wu, F Lv, D Liao, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural
networks when the data for training full-precision models are inaccessible. In ZSQ, network …
networks when the data for training full-precision models are inaccessible. In ZSQ, network …
Diverse sample generation: Pushing the limit of generative data-free quantization
Generative data-free quantization emerges as a practical compression approach that
quantizes deep neural networks to low bit-width without accessing the real data. This …
quantizes deep neural networks to low bit-width without accessing the real data. This …
{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …
field of artificial intelligence, demonstrating remarkable capabilities in natural language …
Sana: Efficient high-resolution image synthesis with linear diffusion transformers
We introduce Sana, a text-to-image framework that can efficiently generate images up to
4096$\times $4096 resolution. Sana can synthesize high-resolution, high-quality images …
4096$\times $4096 resolution. Sana can synthesize high-resolution, high-quality images …