- Academic Search

Openmoe: An early effort on open mixture-of-experts language models

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

บันทึก อ้างอิง อ้างโดย72 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ แคช

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Llama-moe: Building mixture-of-experts from llama with continual pre-training

T Zhu, X Qu, D Dong, J Ruan, J Tong… - Proceedings of the …, 2024 - aclanthology.org

Abstract Mixture-of-Experts (MoE) has gained increasing popularity as a promising
framework for scaling up large language models (LLMs). However, training MoE from …

บันทึก อ้างอิง อ้างโดย34 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Maplm: A real-world large-scale vision-language benchmark for map and traffic scene understanding

X Cao, T Zhou, Y Ma, W Ye, C Cui… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language generative AI has demonstrated remarkable promise for empowering cross-
modal scene understanding of autonomous driving and high-definition (HD) map systems …

บันทึก อ้างอิง อ้างโดย9 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

บันทึก อ้างอิง อ้างโดย42 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mvmoe: Multi-task vehicle routing solver with mixture-of-experts

J Zhou, Z Cao, Y Wu, W Song, Y Ma, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Learning to solve vehicle routing problems (VRPs) has garnered much attention. However,
most neural solvers are only structured and trained independently on a specific problem …

บันทึก อ้างอิง อ้างโดย22 บทความที่เกี่ยวข้อง ทั้งหมด 11 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lory: Fully differentiable mixture-of-experts for autoregressive language model pre-training

Z Zhong, M **a, D Chen, M Lewis - arxiv preprint arxiv:2405.03133, 2024 - arxiv.org

Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router
network introduces the challenge of optimizing a non-differentiable, discrete objective …

บันทึก อ้างอิง อ้างโดย15 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Combining fine-tuning and llm-based agents for intuitive smart contract auditing with justifications

W Ma, D Wu, Y Sun, T Wang, S Liu, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent
research has shown that large language models (LLMs) have potential in auditing smart …

บันทึก อ้างอิง อ้างโดย21 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Branch-train-mix: Mixing expert llms into a mixture-of-experts llm

S Sukhbaatar, O Golovneva, V Sharma, H Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

We investigate efficient methods for training Large Language Models (LLMs) to possess
capabilities in multiple specialized domains, such as coding, math reasoning and world …

บันทึก อ้างอิง อ้างโดย8 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Demystifying the compression of mixture-of-experts through a unified framework

S He, D Dong, L Ding, A Li - arxiv preprint arxiv:2406.02500, 2024 - arxiv.org

Scaling large language models has revolutionized the performance across diverse domains,
yet the continual growth in model size poses significant challenges for real-world …

บันทึก อ้างอิง อ้างโดย12 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shortcut-connected expert parallelism for accelerating mixture-of-experts

W Cai, J Jiang, L Qin, J Cui, S Kim, J Huang - arxiv preprint arxiv …, 2024 - arxiv.org

Expert parallelism has been introduced as a strategy to distribute the computational
workload of sparsely-gated mixture-of-experts (MoE) models across multiple computing …

บันทึก อ้างอิง อ้างโดย8 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Openmoe: An early effort on open mixture-of-experts language models

A survey on mixture of experts

Llama-moe: Building mixture-of-experts from llama with continual pre-training

Maplm: A real-world large-scale vision-language benchmark for map and traffic scene understanding

Model compression and efficient inference for large language models: A survey

Mvmoe: Multi-task vehicle routing solver with mixture-of-experts

Lory: Fully differentiable mixture-of-experts for autoregressive language model pre-training

Combining fine-tuning and llm-based agents for intuitive smart contract auditing with justifications

Branch-train-mix: Mixing expert llms into a mixture-of-experts llm

Demystifying the compression of mixture-of-experts through a unified framework

Shortcut-connected expert parallelism for accelerating mixture-of-experts