- Academic Search

E Yang, L Shen, G Guo, X Wang, X Cao… - arxiv preprint arxiv …, 2024 - arxiv.org

Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

Zapisz Cytuj Cytowane przez 44 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

A closer look into mixture-of-experts in large language models

KM Lo, Z Huang, Z Qiu, Z Wang, J Fu - arxiv preprint arxiv:2406.18219, 2024 - arxiv.org

Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and
remarkable performance, especially for language tasks. By sparsely activating a subset of …

Zapisz Cytuj Cytowane przez 6 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Lora soups: Merging loras for practical skill composition tasks

A Prabhakar, Y Li, K Narasimhan, S Kakade… - arxiv preprint arxiv …, 2024 - arxiv.org

Low-Rank Adaptation (LoRA) is a popular technique for parameter-efficient fine-tuning of
Large Language Models (LLMs). We study how different LoRA modules can be merged to …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] openreview.net

Llm merging: Building llms efficiently through merging

D Tam, M Li, P Yadav, RB Gabrielsson… - NeurIPS 2024 …, 2024 - openreview.net

Training high-performing large language models (LLMs) from scratch is a notoriously
expensive and difficult task, costing hundreds of millions of dollars in compute alone. These …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts

Z Zhang, X Liu, H Cheng, C Xu, J Gao - arxiv preprint arxiv:2407.09590, 2024 - arxiv.org

By increasing model parameters but activating them sparsely when performing a task, the
use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Y **e, Z Zhang, D Zhou, C **e, Z Song, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Mixture-of-Experts (MoE) architectures face challenges such as high memory consumption
and redundancy in experts. Pruning MoE can reduce network weights while maintaining …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

MoE-I: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition

C Yang, Y Sui, J **ao, L Huang, Y Gong… - arxiv preprint arxiv …, 2024 - arxiv.org

The emergence of Mixture of Experts (MoE) LLMs has significantly advanced the
development of language models. Compared to traditional LLMs, MoE LLMs outperform …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

A Survey on Inference Optimization Techniques for Mixture of Experts Models

J Liu, P Tang, W Wang, Y Ren, X Hou, PA Heng… - arxiv preprint arxiv …, 2024 - arxiv.org

The emergence of large-scale Mixture of Experts (MoE) models has marked a significant
advancement in artificial intelligence, offering enhanced model capacity and computational …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Hobbit: A mixed precision expert offloading system for fast moe inference

P Tang, J Liu, X Hou, Y Pu, J Wang, PA Heng… - arxiv preprint arxiv …, 2024 - arxiv.org

The Mixture-of-Experts (MoE) architecture has demonstrated significant advantages in the
era of Large Language Models (LLMs), offering enhanced capabilities with reduced …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning

S Sarkar, L Lausen, V Cevher, S Zha, T Brox… - arxiv preprint arxiv …, 2024 - arxiv.org

Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense
models in language modeling. These models use conditionally activated feedforward …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Merge, then compress: Demystify efficient SMoe with hints from its routing policy

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

A closer look into mixture-of-experts in large language models

Lora soups: Merging loras for practical skill composition tasks

Llm merging: Building llms efficiently through merging

Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

MoE-I: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition

A Survey on Inference Optimization Techniques for Mixture of Experts Models

Hobbit: A mixed precision expert offloading system for fast moe inference

Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning