Google Académico

T Zhu, X Qu, D Dong, J Ruan, J Tong… - Proceedings of the …, 2024 - aclanthology.org

Abstract Mixture-of-Experts (MoE) has gained increasing popularity as a promising
framework for scaling up large language models (LLMs). However, training MoE from …

Guardar Citar Citado por 30 Artículos relacionados Las 3 versiones Versión en HTML

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Guardar Citar Citado por 58 Artículos relacionados Las 3 versiones En caché

[Free GPT-4]

[PDF] arxiv.org

A closer look into mixture-of-experts in large language models

KM Lo, Z Huang, Z Qiu, Z Wang, J Fu - arxiv preprint arxiv:2406.18219, 2024 - arxiv.org

Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and
remarkable performance, especially for language tasks. By sparsely activating a subset of …

Guardar Citar Citado por 6 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications

W Ma, D Wu, Y Sun, T Wang, S Liu, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent
research has shown that large language models (LLMs) have potential in auditing smart …

Guardar Citar Citado por 18 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Theory on mixture-of-experts in continual learning

H Li, S Lin, L Duan, Y Liang, NB Shroff - arxiv preprint arxiv:2406.16437, 2024 - arxiv.org

Continual learning (CL) has garnered significant attention because of its ability to adapt to
new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a …

Guardar Citar Citado por 5 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models--The Story Goes On

L Zeng, L Zhong, L Zhao, T Wei, L Yang, J He… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we investigate the underlying factors that potentially enhance the mathematical
reasoning capabilities of large language models (LLMs). We argue that the data scaling law …

Guardar Citar Citado por 4 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding

X Cao, T Zhou, Y Ma, W Ye, C Cui… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language generative AI has demonstrated remarkable promise for empowering cross-
modal scene understanding of autonomous driving and high-definition (HD) map systems …

Guardar Citar Citado por 9 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

J Zhou, Z Cao, Y Wu, W Song, Y Ma, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Learning to solve vehicle routing problems (VRPs) has garnered much attention. However,
most neural solvers are only structured and trained independently on a specific problem …

Guardar Citar Citado por 17 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

Guardar Citar Citado por 26 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training

Z Zhong, M **a, D Chen, M Lewis - arxiv preprint arxiv:2405.03133, 2024 - arxiv.org

Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router
network introduces the challenge of optimizing a non-differentiable, discrete objective …

Guardar Citar Citado por 14 Artículos relacionados Las 2 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Openmoe: An early effort on open mixture-of-experts language models

Llama-moe: Building mixture-of-experts from llama with continual pre-training

A survey on mixture of experts

A closer look into mixture-of-experts in large language models

Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications

Theory on mixture-of-experts in continual learning

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models--The Story Goes On

MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding

MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

Model compression and efficient inference for large language models: A survey

Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training