Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …
that does not require the collection of raw training data and does not require expensive …
Deep model fusion: A survey
Deep model fusion/merging is an emerging technique that merges the parameters or
predictions of multiple deep learning models into a single one. It combines the abilities of …
predictions of multiple deep learning models into a single one. It combines the abilities of …
Transformer fusion with optimal transport
Fusion is a technique for merging multiple independently-trained neural networks in order to
combine their capabilities. Past attempts have been restricted to the case of fully-connected …
combine their capabilities. Past attempts have been restricted to the case of fully-connected …
Sparse model soups: A recipe for improved pruning via model averaging
Neural networks can be significantly compressed by pruning, leading to sparse models
requiring considerably less storage and floating-point operations while maintaining …
requiring considerably less storage and floating-point operations while maintaining …
Localize-and-stitch: Efficient model merging via sparse task arithmetic
Model merging offers an effective strategy to combine the strengths of multiple finetuned
models into a unified model that preserves the specialized capabilities of each. Existing …
models into a unified model that preserves the specialized capabilities of each. Existing …
Training neural networks from scratch with parallel low-rank adapters
The scalability of deep learning models is fundamentally limited by computing resources,
memory, and communication. Although methods like low-rank adaptation (LoRA) have …
memory, and communication. Although methods like low-rank adaptation (LoRA) have …
Cool-fusion: Fuse large language models without training
C Liu, X Quan, Y Pan, L Lin, W Wu, X Chen - arxiv preprint arxiv …, 2024 - arxiv.org
We focus on the problem of fusing two or more heterogeneous large language models
(LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is …
(LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is …
Atm: Improving model merging by alternating tuning and merging
Model merging has recently emerged as a cost-efficient paradigm for multi-task learning.
Among current approaches, task arithmetic stands out for its simplicity and effectiveness. In …
Among current approaches, task arithmetic stands out for its simplicity and effectiveness. In …
How Good is a Single Basin?
The multi-modal nature of neural loss landscapes is often considered to be the main driver
behind the empirical success of deep ensembles. In this work, we probe this belief by …
behind the empirical success of deep ensembles. In this work, we probe this belief by …
A Second-Order perspective on Compositionality and Incremental Learning
The fine-tuning of deep pre-trained models has recently revealed compositional properties.
This enables the arbitrary composition of multiple specialized modules into a single, multi …
This enables the arbitrary composition of multiple specialized modules into a single, multi …