A survey on lora of large language models
Y Mao, Y Ge, Y Fan, W Xu, Y Mi, Z Hu… - Frontiers of Computer …, 2025 - Springer
Abstract Low-Rank Adaptation (LoRA), which updates the dense neural network layers with
pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning …
pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning …
Holmes ⌕ A Benchmark to Assess the Linguistic Competence of Language Models
We introduce Holmes, a new benchmark designed to assess language models'(LMs')
linguistic competence—their unconscious understanding of linguistic phenomena …
linguistic competence—their unconscious understanding of linguistic phenomena …
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Mixture-of-Experts (MoE) architectures face challenges such as high memory consumption
and redundancy in experts. Pruning MoE can reduce network weights while maintaining …
and redundancy in experts. Pruning MoE can reduce network weights while maintaining …
Compress then serve: Serving thousands of lora adapters with little overhead
Fine-tuning large language models (LLMs) with low-rank adaptations (LoRAs) has become
common practice, often yielding numerous copies of the same LLM differing only in their …
common practice, often yielding numerous copies of the same LLM differing only in their …
Asymmetry in low-rank adapters of foundation models
Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a
subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective …
subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective …
Exploring Quantization Techniques for Large-Scale Language Models: Methods, Challenges and Future Directions
A Shen, Z Lai, D Li - Proceedings of the 2024 9th International …, 2024 - dl.acm.org
Breakthroughs in natural language processing (NLP) by large-scale language models
(LLMs) have led to superior performance in multilingual tasks such as translation …
(LLMs) have led to superior performance in multilingual tasks such as translation …
Federated LoRA with Sparse Communication
Low-rank adaptation (LoRA) is a natural method for finetuning in communication-
constrained machine learning settings such as cross-device federated learning. Prior work …
constrained machine learning settings such as cross-device federated learning. Prior work …
Lossless and Near-Lossless Compression for Foundation Models
With the growth of model sizes and scale of their deployment, their sheer size burdens the
infrastructure requiring more network and more storage to accommodate these. While there …
infrastructure requiring more network and more storage to accommodate these. While there …
Unforgettable Generalization in Language Models
When language models (LMs) are trained to``unlearn''a skill, does this unlearning
generalize? We study the behavior of LMs after fine-tuned on data for a target task (eg …
generalize? We study the behavior of LMs after fine-tuned on data for a target task (eg …
Towards maintainable machine learning development through continual and modular learning
O Ostapenko - 2024 - papyrus.bib.umontreal.ca
As machine learning models grow in size and complexity, their maintainability becomes a
critical concern, especially when they are increasingly deployed in dynamic, real-world …
critical concern, especially when they are increasingly deployed in dynamic, real-world …