Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Efficientqat: Efficient quantization-aware training for large language models

M Chen, W Shao, P Xu, J Wang, P Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …

Talking heads: Understanding inter-layer communication in transformer language models

J Merullo, C Eickhoff, E Pavlick - Advances in Neural …, 2025 - proceedings.neurips.cc
Although it is known that transformer language models (LMs) pass features from early layers
to later layers, it is not well understood how this information is represented and routed by the …

Svdqunat: Absorbing outliers by low-rank components for 4-bit diffusion models

M Li, Y Lin, Z Zhang, T Cai, X Li, J Guo, E **e… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been proven highly effective at generating high-quality images.
However, as these models grow larger, they require significantly more memory and suffer …

A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

Compressing large language models using low rank and low precision decomposition

R Saha, N Sagan, V Srivastava… - Advances in …, 2025 - proceedings.neurips.cc
The prohibitive sizes of Large Language Models (LLMs) today make it difficult to deploy
them on memory-constrained edge devices. This work introduces $\rm CALDERA $--a new …

Low-rank quantization-aware training for llms

Y Bondarenko, R Del Chiaro, M Nagel - arxiv preprint arxiv:2406.06385, 2024 - arxiv.org
Large language models (LLMs) are omnipresent, however their practical deployment is
challenging due to their ever increasing computational and memory demands. Quantization …

Lottery ticket adaptation: Mitigating destructive interference in llms

A Panda, B Isik, X Qi, S Koyejo, T Weissman… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing methods for adapting large language models (LLMs) to new tasks are not suited to
multi-task adaptation because they modify all the model weights--causing destructive …

A fine-tuning enhanced RAG system with quantized influence measure as AI judge

K Rangan, Y Yin - Scientific Reports, 2024 - nature.com
This study presents an innovative enhancement to retrieval-augmented generation (RAG)
systems by seamlessly integrating fine-tuned large language models (LLMs) with vector …

Fast matrix multiplications for lookup table-quantized llms

H Guo, W Brandon, R Cholakov… - arxiv preprint arxiv …, 2024 - arxiv.org
The deployment of large language models (LLMs) is often constrained by memory
bandwidth, where the primary bottleneck is the cost of transferring model parameters from …