Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

Deep model fusion: A survey

W Li, Y Peng, M Zhang, L Ding, H Hu… - arxiv preprint arxiv …, 2023 - arxiv.org
Deep model fusion/merging is an emerging technique that merges the parameters or
predictions of multiple deep learning models into a single one. It combines the abilities of …

Transformer fusion with optimal transport

M Imfeld, J Graldi, M Giordano, T Hofmann… - arxiv preprint arxiv …, 2023 - arxiv.org
Fusion is a technique for merging multiple independently-trained neural networks in order to
combine their capabilities. Past attempts have been restricted to the case of fully-connected …

Sparse model soups: A recipe for improved pruning via model averaging

M Zimmer, C Spiegel, S Pokutta - arxiv preprint arxiv:2306.16788, 2023 - arxiv.org
Neural networks can be significantly compressed by pruning, leading to sparse models
requiring considerably less storage and floating-point operations while maintaining …

Localize-and-stitch: Efficient model merging via sparse task arithmetic

Y He, Y Hu, Y Lin, T Zhang, H Zhao - arxiv preprint arxiv:2408.13656, 2024 - arxiv.org
Model merging offers an effective strategy to combine the strengths of multiple finetuned
models into a unified model that preserves the specialized capabilities of each. Existing …

Training neural networks from scratch with parallel low-rank adapters

M Huh, B Cheung, J Bernstein, P Isola… - arxiv preprint arxiv …, 2024 - arxiv.org
The scalability of deep learning models is fundamentally limited by computing resources,
memory, and communication. Although methods like low-rank adaptation (LoRA) have …

Cool-fusion: Fuse large language models without training

C Liu, X Quan, Y Pan, L Lin, W Wu, X Chen - arxiv preprint arxiv …, 2024 - arxiv.org
We focus on the problem of fusing two or more heterogeneous large language models
(LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is …

Atm: Improving model merging by alternating tuning and merging

L Zhou, D Solombrino, D Crisostomi… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging has recently emerged as a cost-efficient paradigm for multi-task learning.
Among current approaches, task arithmetic stands out for its simplicity and effectiveness. In …

How Good is a Single Basin?

K Lion, L Noci, T Hofmann… - … Conference on Artificial …, 2024 - proceedings.mlr.press
The multi-modal nature of neural loss landscapes is often considered to be the main driver
behind the empirical success of deep ensembles. In this work, we probe this belief by …

A Second-Order perspective on Compositionality and Incremental Learning

A Porrello, L Bonicelli, P Buzzega, M Millunzi… - arxiv preprint arxiv …, 2024 - arxiv.org
The fine-tuning of deep pre-trained models has recently revealed compositional properties.
This enables the arbitrary composition of multiple specialized modules into a single, multi …