A survey on lora of large language models

Y Mao, Y Ge, Y Fan, W Xu, Y Mi, Z Hu… - Frontiers of Computer …, 2025 - Springer
Abstract Low-Rank Adaptation (LoRA), which updates the dense neural network layers with
pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning …

Holmes ⌕ A Benchmark to Assess the Linguistic Competence of Language Models

A Waldis, Y Perlitz, L Choshen, Y Hou… - Transactions of the …, 2024 - direct.mit.edu
We introduce Holmes, a new benchmark designed to assess language models'(LMs')
linguistic competence—their unconscious understanding of linguistic phenomena …

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Y **e, Z Zhang, D Zhou, C **e, Z Song, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Mixture-of-Experts (MoE) architectures face challenges such as high memory consumption
and redundancy in experts. Pruning MoE can reduce network weights while maintaining …

Compress then serve: Serving thousands of lora adapters with little overhead

R Brüel-Gabrielsson, J Zhu, O Bhardwaj… - arxiv preprint arxiv …, 2024 - arxiv.org
Fine-tuning large language models (LLMs) with low-rank adaptations (LoRAs) has become
common practice, often yielding numerous copies of the same LLM differing only in their …

Asymmetry in low-rank adapters of foundation models

J Zhu, K Greenewald, K Nadjahi, HSO Borde… - arxiv preprint arxiv …, 2024 - arxiv.org
Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a
subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective …

Exploring Quantization Techniques for Large-Scale Language Models: Methods, Challenges and Future Directions

A Shen, Z Lai, D Li - Proceedings of the 2024 9th International …, 2024 - dl.acm.org
Breakthroughs in natural language processing (NLP) by large-scale language models
(LLMs) have led to superior performance in multilingual tasks such as translation …

Federated LoRA with Sparse Communication

K Kuo, A Raje, K Rajesh, V Smith - arxiv preprint arxiv:2406.05233, 2024 - arxiv.org
Low-rank adaptation (LoRA) is a natural method for finetuning in communication-
constrained machine learning settings such as cross-device federated learning. Prior work …

Lossless and Near-Lossless Compression for Foundation Models

M Hershcovitch, L Choshen, A Wood, I Enmouri… - arxiv preprint arxiv …, 2024 - arxiv.org
With the growth of model sizes and scale of their deployment, their sheer size burdens the
infrastructure requiring more network and more storage to accommodate these. While there …

Unforgettable Generalization in Language Models

E Zhang, L Choshen, J Andreas - First Conference on Language …, 2024 - openreview.net
When language models (LMs) are trained to``unlearn''a skill, does this unlearning
generalize? We study the behavior of LMs after fine-tuned on data for a target task (eg …

Towards maintainable machine learning development through continual and modular learning

O Ostapenko - 2024 - papyrus.bib.umontreal.ca
As machine learning models grow in size and complexity, their maintainability becomes a
critical concern, especially when they are increasingly deployed in dynamic, real-world …