How does quantization affect multilingual LLMs?

K Marchisio, S Dash, H Chen, D Aumiller… - arxiv preprint arxiv …, 2024 - arxiv.org
Quantization techniques are widely used to improve inference speed and deployment of
large language models. While a wide body of work examines the impact of quantization on …

VcLLM: Video Codecs are Secretly Tensor Codecs

C Xu, Y Wu, X Yang, B Chen, M Lentz, D Zhuo… - arxiv preprint arxiv …, 2024 - arxiv.org
As the parameter size of large language models (LLMs) continues to expand, the need for a
large memory footprint and high communication bandwidth have become significant …