Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization

J Kim, JH Lee, S Kim, J Park, KM Yoo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Large language models (LLMs) face the challenges in fine-tuning and deployment due to
their high memory demands and computational costs. While parameter-efficient fine-tuning …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025 - dl.acm.org
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

Ptq4sam: Post-training quantization for segment anything

C Lv, H Chen, J Guo, Y Ding… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Segment Anything Model (SAM) has achieved impressive performance in many
computer vision tasks. However as a large-scale model the immense memory and …

Efficientqat: Efficient quantization-aware training for large language models

M Chen, W Shao, P Xu, J Wang, P Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

Nearest is not dearest: Towards practical defense against quantization-conditioned backdoor attacks

B Li, Y Cai, H Li, F Xue, Z Li… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Model quantization is widely used to compress and accelerate deep neural
networks. However recent studies have revealed the feasibility of weaponizing model …

Shiftaddllm: Accelerating pretrained llms via post-training multiplication-less reparameterization

H You, Y Guo, Y Fu, W Zhou, H Shi… - Advances in …, 2025 - proceedings.neurips.cc
Large language models (LLMs) have shown impressive performance on language tasks but
face challenges when deployed on resource-constrained devices due to their extensive …

Low-rank quantization-aware training for llms

Y Bondarenko, R Del Chiaro, M Nagel - arxiv preprint arxiv:2406.06385, 2024 - arxiv.org
Large language models (LLMs) are omnipresent, however their practical deployment is
challenging due to their ever increasing computational and memory demands. Quantization …