A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Llama-moe: Building mixture-of-experts from llama with continual pre-training

T Zhu, X Qu, D Dong, J Ruan, J Tong… - Proceedings of the …, 2024 - aclanthology.org
Abstract Mixture-of-Experts (MoE) has gained increasing popularity as a promising
framework for scaling up large language models (LLMs). However, training MoE from …

Maplm: A real-world large-scale vision-language benchmark for map and traffic scene understanding

X Cao, T Zhou, Y Ma, W Ye, C Cui… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-language generative AI has demonstrated remarkable promise for empowering cross-
modal scene understanding of autonomous driving and high-definition (HD) map systems …

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

Mvmoe: Multi-task vehicle routing solver with mixture-of-experts

J Zhou, Z Cao, Y Wu, W Song, Y Ma, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning to solve vehicle routing problems (VRPs) has garnered much attention. However,
most neural solvers are only structured and trained independently on a specific problem …

Lory: Fully differentiable mixture-of-experts for autoregressive language model pre-training

Z Zhong, M **a, D Chen, M Lewis - arxiv preprint arxiv:2405.03133, 2024 - arxiv.org
Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router
network introduces the challenge of optimizing a non-differentiable, discrete objective …

Combining fine-tuning and llm-based agents for intuitive smart contract auditing with justifications

W Ma, D Wu, Y Sun, T Wang, S Liu, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent
research has shown that large language models (LLMs) have potential in auditing smart …

Branch-train-mix: Mixing expert llms into a mixture-of-experts llm

S Sukhbaatar, O Golovneva, V Sharma, H Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
We investigate efficient methods for training Large Language Models (LLMs) to possess
capabilities in multiple specialized domains, such as coding, math reasoning and world …

Demystifying the compression of mixture-of-experts through a unified framework

S He, D Dong, L Ding, A Li - arxiv preprint arxiv:2406.02500, 2024 - arxiv.org
Scaling large language models has revolutionized the performance across diverse domains,
yet the continual growth in model size poses significant challenges for real-world …

Shortcut-connected expert parallelism for accelerating mixture-of-experts

W Cai, J Jiang, L Qin, J Cui, S Kim, J Huang - arxiv preprint arxiv …, 2024 - arxiv.org
Expert parallelism has been introduced as a strategy to distribute the computational
workload of sparsely-gated mixture-of-experts (MoE) models across multiple computing …