Llama-moe: Building mixture-of-experts from llama with continual pre-training
Abstract Mixture-of-Experts (MoE) has gained increasing popularity as a promising
framework for scaling up large language models (LLMs). However, training MoE from …
framework for scaling up large language models (LLMs). However, training MoE from …
A survey on mixture of experts
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …
diverse fields, ranging from natural language processing to computer vision and beyond …
A closer look into mixture-of-experts in large language models
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and
remarkable performance, especially for language tasks. By sparsely activating a subset of …
remarkable performance, especially for language tasks. By sparsely activating a subset of …
Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications
Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent
research has shown that large language models (LLMs) have potential in auditing smart …
research has shown that large language models (LLMs) have potential in auditing smart …
Theory on mixture-of-experts in continual learning
Continual learning (CL) has garnered significant attention because of its ability to adapt to
new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a …
new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a …
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models--The Story Goes On
In this paper, we investigate the underlying factors that potentially enhance the mathematical
reasoning capabilities of large language models (LLMs). We argue that the data scaling law …
reasoning capabilities of large language models (LLMs). We argue that the data scaling law …
MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding
Vision-language generative AI has demonstrated remarkable promise for empowering cross-
modal scene understanding of autonomous driving and high-definition (HD) map systems …
modal scene understanding of autonomous driving and high-definition (HD) map systems …
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Learning to solve vehicle routing problems (VRPs) has garnered much attention. However,
most neural solvers are only structured and trained independently on a specific problem …
most neural solvers are only structured and trained independently on a specific problem …
Model compression and efficient inference for large language models: A survey
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …
the significant memory and computational costs incurred during the inference process make …
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router
network introduces the challenge of optimizing a non-differentiable, discrete objective …
network introduces the challenge of optimizing a non-differentiable, discrete objective …