Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

What Matters for Model Merging at Scale?

P Yadav, T Vu, J Lai, A Chronopoulou… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging aims to combine multiple expert models into a more capable single model,
offering benefits such as reduced storage and serving costs, improved generalization, and …

From lists to emojis: How format bias affects model alignment

X Zhang, W **ong, L Chen, T Zhou, H Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we study format biases in reinforcement learning from human feedback
(RLHF). We observe that many widely-used preference models, including human …

How to Merge Your Multimodal Models Over Time?

S Dziadzio, V Udandarao, K Roth, A Prabhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging combines multiple expert models-finetuned from a base foundation model
on diverse tasks and domains-into a single, more capable model. However, most existing …

Parameter-Efficient Interventions for Enhanced Model Merging

M Osial, D Marczak, B Zieliński - arxiv preprint arxiv:2412.17023, 2024 - arxiv.org
Model merging combines knowledge from task-specific models into a unified multi-task
model to avoid joint training on all task data. However, current methods face challenges due …

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

M Khalifa, YC Tan, A Ahmadian, T Hosking… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging has shown great promise at combining expert models, but the benefit of
merging is unclear when merging``generalist''models trained on many tasks. We explore …

Domain Adaptation for Robust Model Routing

C Dann, Y Mansour, TV Marinov, M Mohri - Adaptive Foundation Models … - openreview.net
The rapid proliferation of domain-specialized machine learning models presents a
challenge: while individual models excel in specific domains, their performance varies …

LOCMAP: LOW-COMPUTE MODEL MERGING WITH AMORTIZED PARETO FRONTS VIA QUADRATIC APPROXIMATION

AP FRONTS - openreview.net
Model merging has emerged as an effective approach to combine multiple single-task
models into a multitask model. This process typically involves computing a weighted …