Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …
that does not require the collection of raw training data and does not require expensive …
A closer look into mixture-of-experts in large language models
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and
remarkable performance, especially for language tasks. By sparsely activating a subset of …
remarkable performance, especially for language tasks. By sparsely activating a subset of …
Lora soups: Merging loras for practical skill composition tasks
Low-Rank Adaptation (LoRA) is a popular technique for parameter-efficient fine-tuning of
Large Language Models (LLMs). We study how different LoRA modules can be merged to …
Large Language Models (LLMs). We study how different LoRA modules can be merged to …
Llm merging: Building llms efficiently through merging
Training high-performing large language models (LLMs) from scratch is a notoriously
expensive and difficult task, costing hundreds of millions of dollars in compute alone. These …
expensive and difficult task, costing hundreds of millions of dollars in compute alone. These …
Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts
By increasing model parameters but activating them sparsely when performing a task, the
use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large …
use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large …
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Mixture-of-Experts (MoE) architectures face challenges such as high memory consumption
and redundancy in experts. Pruning MoE can reduce network weights while maintaining …
and redundancy in experts. Pruning MoE can reduce network weights while maintaining …
MoE-I: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
The emergence of Mixture of Experts (MoE) LLMs has significantly advanced the
development of language models. Compared to traditional LLMs, MoE LLMs outperform …
development of language models. Compared to traditional LLMs, MoE LLMs outperform …
A Survey on Inference Optimization Techniques for Mixture of Experts Models
J Liu, P Tang, W Wang, Y Ren, X Hou, PA Heng… - arxiv preprint arxiv …, 2024 - arxiv.org
The emergence of large-scale Mixture of Experts (MoE) models has marked a significant
advancement in artificial intelligence, offering enhanced model capacity and computational …
advancement in artificial intelligence, offering enhanced model capacity and computational …
Hobbit: A mixed precision expert offloading system for fast moe inference
P Tang, J Liu, X Hou, Y Pu, J Wang, PA Heng… - arxiv preprint arxiv …, 2024 - arxiv.org
The Mixture-of-Experts (MoE) architecture has demonstrated significant advantages in the
era of Large Language Models (LLMs), offering enhanced capabilities with reduced …
era of Large Language Models (LLMs), offering enhanced capabilities with reduced …
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense
models in language modeling. These models use conditionally activated feedforward …
models in language modeling. These models use conditionally activated feedforward …