Llama-moe: Building mixture-of-experts from llama with continual pre-training

T Zhu, X Qu, D Dong, J Ruan, J Tong… - Proceedings of the …, 2024 - aclanthology.org
Abstract Mixture-of-Experts (MoE) has gained increasing popularity as a promising
framework for scaling up large language models (LLMs). However, training MoE from …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

A closer look into mixture-of-experts in large language models

KM Lo, Z Huang, Z Qiu, Z Wang, J Fu - arxiv preprint arxiv:2406.18219, 2024 - arxiv.org
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and
remarkable performance, especially for language tasks. By sparsely activating a subset of …

Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications

W Ma, D Wu, Y Sun, T Wang, S Liu, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent
research has shown that large language models (LLMs) have potential in auditing smart …

Theory on mixture-of-experts in continual learning

H Li, S Lin, L Duan, Y Liang, NB Shroff - arxiv preprint arxiv:2406.16437, 2024 - arxiv.org
Continual learning (CL) has garnered significant attention because of its ability to adapt to
new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a …

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models--The Story Goes On

L Zeng, L Zhong, L Zhao, T Wei, L Yang, J He… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we investigate the underlying factors that potentially enhance the mathematical
reasoning capabilities of large language models (LLMs). We argue that the data scaling law …

MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding

X Cao, T Zhou, Y Ma, W Ye, C Cui… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-language generative AI has demonstrated remarkable promise for empowering cross-
modal scene understanding of autonomous driving and high-definition (HD) map systems …

MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

J Zhou, Z Cao, Y Wu, W Song, Y Ma, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning to solve vehicle routing problems (VRPs) has garnered much attention. However,
most neural solvers are only structured and trained independently on a specific problem …

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training

Z Zhong, M **a, D Chen, M Lewis - arxiv preprint arxiv:2405.03133, 2024 - arxiv.org
Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router
network introduces the challenge of optimizing a non-differentiable, discrete objective …