- Academic Search

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

保存引用被引用次数：179 相关文章所有 2 个版本

[Free GPT-4]

[PDF] neurips.cc

Demystifying softmax gating function in Gaussian mixture of experts

H Nguyen, TT Nguyen, N Ho - Advances in Neural …, 2023 - proceedings.neurips.cc

Understanding the parameter estimation of softmax gating Gaussian mixture of experts has
remained a long-standing open problem in the literature. It is mainly due to three …

保存引用被引用次数：25 相关文章所有 11 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Automatic expert selection for multi-scenario and multi-task search

X Zou, Z Hu, Y Zhao, X Ding, Z Liu, C Li… - Proceedings of the 45th …, 2022 - dl.acm.org

Multi-scenario learning (MSL) enables a service provider to cater for users' fine-grained
demands by separating services for different user sectors, eg, by user's geographical region …

保存引用被引用次数：38 相关文章所有 3 个版本

[Free GPT-4]

[PDF] arxiv.org

Scaling diffusion transformers to 16 billion parameters

Z Fei, M Fan, C Yu, D Li, J Huang - arxiv preprint arxiv:2407.11633, 2024 - arxiv.org

In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is
scalable and competitive with dense networks while exhibiting highly optimized inference …

保存引用被引用次数：9 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] acm.org

Mastering stock markets with efficient mixture of diversified trading experts

S Sun, X Wang, W Xue, X Lou, B An - Proceedings of the 29th ACM …, 2023 - dl.acm.org

Quantitative stock investment is a fundamental financial task that highly relies on accurate
prediction of market status and profitable investment decision making. Despite recent …

保存引用被引用次数：12 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

On least squares estimation in softmax gating mixture of experts

H Nguyen, N Ho, A Rinaldo - arxiv preprint arxiv:2402.02952, 2024 - arxiv.org

Mixture of experts (MoE) model is a statistical machine learning design that aggregates
multiple expert networks using a softmax gating function in order to form a more intricate and …

保存引用被引用次数：10 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] neurips.cc

Ta-moe: Topology-aware large scale mixture-of-expert training

C Chen, M Li, Z Wu, D Yu… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Sparsely gated Mixture-of-Expert (MoE) has demonstrated its effectiveness in
scaling up deep neural networks to an extreme scale. Despite that numerous efforts have …

保存引用被引用次数：13 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Y **e, Z Zhang, D Zhou, C **e, Z Song, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Mixture-of-Experts (MoE) architectures face challenges such as high memory consumption
and redundancy in experts. Pruning MoE can reduce network weights while maintaining …

保存引用被引用次数：2 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

CompeteSMoE--Effective Training of Sparse Mixture of Experts via Competition

Q Pham, G Do, H Nguyen, TT Nguyen, C Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model
complexity beyond the mean of increasing the network's depth or width. However, effective …

保存引用被引用次数：11 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] acm.org

Comet: Learning cardinality constrained mixture of experts with trees and local search

S Ibrahim, W Chen, H Hazimeh… - Proceedings of the 29th …, 2023 - dl.acm.org

The sparse Mixture-of-Experts (Sparse-MoE) framework efficiently scales up model capacity
in various domains, such as natural language processing and vision. Sparse-MoEs select a …

保存引用被引用次数：5 相关文章所有 5 个版本

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Scaling vision with sparse mixture of experts

MM1: methods, analysis and insights from multimodal LLM pre-training

Demystifying softmax gating function in Gaussian mixture of experts

Automatic expert selection for multi-scenario and multi-task search

Scaling diffusion transformers to 16 billion parameters

Mastering stock markets with efficient mixture of diversified trading experts

On least squares estimation in softmax gating mixture of experts

Ta-moe: Topology-aware large scale mixture-of-expert training

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

CompeteSMoE--Effective Training of Sparse Mixture of Experts via Competition

Comet: Learning cardinality constrained mixture of experts with trees and local search