A survey on lora of large language models

Y Mao, Y Ge, Y Fan, W Xu, Y Mi, Z Hu… - Frontiers of Computer …, 2025 - Springer
Abstract Low-Rank Adaptation (LoRA), which updates the dense neural network layers with
pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning …

The impact of initialization on lora finetuning dynamics

S Hayou, N Ghosh, B Yu - Advances in Neural Information …, 2025 - proceedings.neurips.cc
In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally
introduced in Hu et al.(2021). Essentially, to start from the pretrained model, one can either …

SFT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

X Yang, J Leng, G Guo, J Zhao… - Advances in …, 2025 - proceedings.neurips.cc
Current PEFT methods for LLMs can achieve high quality, efficient training, or scalable
serving, but not all three simultaneously. To address this limitation, we investigate sparse …

Structured Unrestricted-Rank Matrices for Parameter Efficient Finetuning

A Sehanobish, KA Dubey… - Advances in …, 2025 - proceedings.neurips.cc
Recent efforts to scale Transformer models have demonstrated rapid progress across a wide
range of tasks (Wei at. al 2022). However, fine-tuning these models for downstream tasks is …

Prompt compression for large language models: A survey

Z Li, Y Liu, Y Su, N Collier - arxiv preprint arxiv:2410.12388, 2024 - arxiv.org
Leveraging large language models (LLMs) for complex natural language tasks typically
requires long-form prompts to convey detailed requirements and information, which results …

Tensor Product Attention Is All You Need

Y Zhang, Y Liu, H Yuan, Z Qin, Y Yuan, Q Gu… - arxiv preprint arxiv …, 2025 - arxiv.org
Scaling language models to handle longer input sequences typically necessitates large key-
value (KV) caches, resulting in substantial memory overhead during inference. In this paper …

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

Q Wang, X Hu, W Xu, W Liu, J Luan, B Wang - arxiv preprint arxiv …, 2024 - arxiv.org
Low-rank adaptation (LoRA) and its variants have recently gained much interest due to their
ability to avoid excessive inference costs. However, LoRA still encounters the following …

Slim: Let llm learn more and forget less with soft lora and identity mixture

J Han, L Du, H Du, X Zhou, Y Wu, W Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Although many efforts have been made, it is still a challenge to balance the training budget,
downstream performance, and the general capabilities of the LLMs in many applications …

Accurate and efficient fine-tuning of quantized large language models through optimal balance

A Shen, Q Wang, Z Lai, X Li, D Li - arxiv preprint arxiv:2407.17029, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive performance across various
domains. However, the enormous number of model parameters makes fine-tuning …

MedCare: Advancing medical LLMs through decoupling clinical alignment and knowledge aggregation

Y Liao, S Jiang, Z Chen, Y Wang, Y Wang - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have shown substantial progress in natural language
understanding and generation, proving valuable especially in the medical field. Despite …