What Matters for Model Merging at Scale?

P Yadav, T Vu, J Lai, A Chronopoulou… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging aims to combine multiple expert models into a more capable single model,
offering benefits such as reduced storage and serving costs, improved generalization, and …

Smile: Zero-shot sparse mixture of low-rank experts construction from pre-trained foundation models

A Tang, L Shen, Y Luo, S **e, H Hu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Deep model training on extensive datasets is increasingly becoming cost-prohibitive,
prompting the widespread adoption of deep model fusion techniques to leverage knowledge …

Compress then serve: Serving thousands of lora adapters with little overhead

R Brüel-Gabrielsson, J Zhu, O Bhardwaj… - arxiv preprint arxiv …, 2024 - arxiv.org
Fine-tuning large language models (LLMs) with low-rank adaptations (LoRAs) has become
common practice, often yielding numerous copies of the same LLM differing only in their …

Efficient and effective weight-ensembling mixture of experts for multi-task model merging

L Shen, A Tang, E Yang, G Guo, Y Luo, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-task learning (MTL) leverages a shared model to accomplish multiple tasks and
facilitate knowledge transfer. Recent research on task arithmetic-based MTL demonstrates …

Sok: On finding common ground in loss landscapes using deep model merging techniques

A Khan, T Nief, N Hudson, M Sakarvadia… - arxiv preprint arxiv …, 2024 - arxiv.org
Understanding neural networks is crucial to creating reliable and trustworthy deep learning
models. Most contemporary research in interpretability analyzes just one model at a time via …

Merging loras like playing lego: Pushing the modularity of lora to extremes through rank-wise clustering

Z Zhao, T Shen, D Zhu, Z Li, J Su, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Low-Rank Adaptation (LoRA) has emerged as a popular technique for fine-tuning large
language models (LLMs) to various domains due to its modular design and widespread …

SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery

E Yang, L Shen, Z Wang, G Guo, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging-based multitask learning (MTL) offers a promising approach for performing
MTL by merging multiple expert models without requiring access to raw training data …

Collective model intelligence requires compatible specialization

J Pari, S Jelassi, P Agrawal - arxiv preprint arxiv:2411.02207, 2024 - arxiv.org
In this work, we explore the limitations of combining models by averaging intermediate
features, referred to as model merging, and propose a new direction for achieving collective …

How to Merge Your Multimodal Models Over Time?

S Dziadzio, V Udandarao, K Roth, A Prabhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging combines multiple expert models-finetuned from a base foundation model
on diverse tasks and domains-into a single, more capable model. However, most existing …

Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

A Tang, E Yang, L Shen, Y Luo, H Hu, B Du… - arxiv preprint arxiv …, 2025 - arxiv.org
Deep model merging represents an emerging research direction that combines multiple fine-
tuned models to harness their specialized capabilities across different tasks and domains …