Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

A closer look into mixture-of-experts in large language models

KM Lo, Z Huang, Z Qiu, Z Wang, J Fu - arxiv preprint arxiv:2406.18219, 2024 - arxiv.org
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and
remarkable performance, especially for language tasks. By sparsely activating a subset of …

Lora soups: Merging loras for practical skill composition tasks

A Prabhakar, Y Li, K Narasimhan, S Kakade… - arxiv preprint arxiv …, 2024 - arxiv.org
Low-Rank Adaptation (LoRA) is a popular technique for parameter-efficient fine-tuning of
Large Language Models (LLMs). We study how different LoRA modules can be merged to …

Llm merging: Building llms efficiently through merging

D Tam, M Li, P Yadav, RB Gabrielsson… - NeurIPS 2024 …, 2024 - openreview.net
Training high-performing large language models (LLMs) from scratch is a notoriously
expensive and difficult task, costing hundreds of millions of dollars in compute alone. These …

Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts

Z Zhang, X Liu, H Cheng, C Xu, J Gao - arxiv preprint arxiv:2407.09590, 2024 - arxiv.org
By increasing model parameters but activating them sparsely when performing a task, the
use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large …

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Y **e, Z Zhang, D Zhou, C **e, Z Song, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Mixture-of-Experts (MoE) architectures face challenges such as high memory consumption
and redundancy in experts. Pruning MoE can reduce network weights while maintaining …

MoE-I: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition

C Yang, Y Sui, J **ao, L Huang, Y Gong… - arxiv preprint arxiv …, 2024 - arxiv.org
The emergence of Mixture of Experts (MoE) LLMs has significantly advanced the
development of language models. Compared to traditional LLMs, MoE LLMs outperform …

A Survey on Inference Optimization Techniques for Mixture of Experts Models

J Liu, P Tang, W Wang, Y Ren, X Hou, PA Heng… - arxiv preprint arxiv …, 2024 - arxiv.org
The emergence of large-scale Mixture of Experts (MoE) models has marked a significant
advancement in artificial intelligence, offering enhanced model capacity and computational …

Hobbit: A mixed precision expert offloading system for fast moe inference

P Tang, J Liu, X Hou, Y Pu, J Wang, PA Heng… - arxiv preprint arxiv …, 2024 - arxiv.org
The Mixture-of-Experts (MoE) architecture has demonstrated significant advantages in the
era of Large Language Models (LLMs), offering enhanced capabilities with reduced …

Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning

S Sarkar, L Lausen, V Cevher, S Zha, T Brox… - arxiv preprint arxiv …, 2024 - arxiv.org
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense
models in language modeling. These models use conditionally activated feedforward …