- Academic Search

X Li, Y Liu, L Lian, H Yang, Z Dong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Diffusion models have achieved great success in image synthesis through iterative noise
estimation using deep neural networks. However, the slow inference, high memory …

Uložit Citovat Počet citací tohoto článku: 158 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Uložit Citovat Počet citací tohoto článku: 91 Související články Všechny verze (počet: 7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Outlier suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

X Wei, Y Zhang, Y Li, X Zhang, R Gong, J Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

Post-training quantization~(PTQ) of transformer language models faces significant
challenges due to the existence of detrimental outliers in activations. We observe that these …

Uložit Citovat Počet citací tohoto článku: 92 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Compressing large language models by joint sparsification and quantization

J Guo, J Wu, Z Wang, J Liu, G Yang, Y Ding… - … on Machine Learning, 2024 - openreview.net

In this paper, we introduce a novel model compression technique named Joint Sparsification
and Quantization (JSQ), explicitly tailored for large language models (LLMs). Traditional …

Uložit Citovat Počet citací tohoto článku: 13 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers

Y Liu, H Yang, Z Dong, K Keutzer… - Proceedings of the …, 2023 - openaccess.thecvf.com

The complicated architecture and high training cost of vision transformers urge the
exploration of post-training quantization. However, the heavy-tailed distribution of vision …

Uložit Citovat Počet citací tohoto článku: 48 Související články Všechny verze (počet: 6) Hledat knihovnu Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Intraq: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization

Y Zhong, M Lin, G Nan, J Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Learning to synthesize data has emerged as a promising direction in zero-shot quantization
(ZSQ), which represents neural networks by low-bit integer without accessing any of the real …

Uložit Citovat Počet citací tohoto článku: 87 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

Uložit Citovat Počet citací tohoto článku: 52 Související články Všechny verze (počet: 9)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Hard sample matters a lot in zero-shot quantization

H Li, X Wu, F Lv, D Liao, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural
networks when the data for training full-precision models are inaccessible. In ZSQ, network …

Uložit Citovat Počet citací tohoto článku: 23 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

W Cui, H Zhao, Q Chen, H Wei, Z Li, D Zeng… - 2022 USENIX Annual …, 2022 - usenix.org

The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …

Uložit Citovat Počet citací tohoto článku: 47 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Outlier suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling

X Wei, Y Zhang, Y Li, X Zhang, R Gong… - Proceedings of the …, 2023 - aclanthology.org

Post-training quantization (PTQ) of transformer language models faces significant
challenges due to the existence of detrimental outliers in activations. We observe that these …

Uložit Citovat Počet citací tohoto článku: 19 Související články Všechny verze (počet: 4) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Squant: On-the-fly data-free quantization via diagonal hessian approximation

Q-diffusion: Quantizing diffusion models

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

Outlier suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Compressing large language models by joint sparsification and quantization

Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers

Intraq: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

Hard sample matters a lot in zero-shot quantization

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

Outlier suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling