- Academic Search

Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition

W Fedus, J Dean, B Zoph - arxiv preprint arxiv:2209.01667, 2022 - arxiv.org

Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

Save Cite Cited by 144 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org

Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

Save Cite Cited by 115 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

On the representation collapse of sparse mixture of experts

Z Chi, L Dong, S Huang, D Dai, S Ma… - Advances in …, 2022 - proceedings.neurips.cc

Sparse mixture of experts provides larger model capacity while requiring a constant
computational overhead. It employs the routing mechanism to distribute input tokens to the …

Save Cite Cited by 83 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Multi-head mixture-of-experts

X Wu, S Huang, W Wang, F Wei - arxiv preprint arxiv:2404.15045, 2024 - arxiv.org

Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in
training and inference costs, but exhibits the following two issues:(1) Low expert activation …

Save Cite Cited by 12 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Language-routing mixture of experts for multilingual and code-switching speech recognition

W Wang, G Ma, Y Li, B Du - arxiv preprint arxiv:2307.05956, 2023 - arxiv.org

Multilingual speech recognition for both monolingual and code-switching speech is a
challenging task. Recently, based on the Mixture of Experts (MoE), many works have made …

Save Cite Cited by 16 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org

Moec: Mixture of expert clusters

Y **e, S Huang, T Chen, F Wei - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

Abstract Sparsely Mixture of Experts (MoE) has received great interest due to its promising
scaling capability with affordable computational overhead. MoE models convert dense …

Save Cite Cited by 15 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Mole: Mixture of language experts for multi-lingual automatic speech recognition

Y Kwon, SW Chung - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

Multi-lingual speech recognition aims to distinguish linguistic expressions in different
languages and integrate acoustic processing simultaneously. In contrast, current …

Save Cite Cited by 19 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

MH-MoE: Multi-Head Mixture-of-Experts

S Huang, X Wu, S Ma, F Wei - arxiv preprint arxiv:2411.16205, 2024 - arxiv.org

Multi-Head Mixture-of-Experts (MH-MoE) demonstrates superior performance by using the
multi-head mechanism to collectively attend to information from various representation …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

RSY Teo, TM Nguyen - arxiv preprint arxiv:2410.14574, 2024 - arxiv.org

Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability
in deep learning. SMoE has the potential to exponentially increase parameter count while …

[Free GPT-4]

[PDF] arxiv.org

Inference and Denoise: Causal Inference-Based Neural Speech Enhancement

TA Hsieh, CHH Yang, PY Chen… - 2023 IEEE 33rd …, 2023 - ieeexplore.ieee.org

This study addresses the speech enhancement (SE) task within the causal inference
paradigm by modeling the noise presence as an intervention. Based on the potential …

Save Cite Cited by 2 Related articles All 5 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition

A review of sparse expert models in deep learning

Modular deep learning

On the representation collapse of sparse mixture of experts

Multi-head mixture-of-experts

Language-routing mixture of experts for multilingual and code-switching speech recognition

Moec: Mixture of expert clusters

Mole: Mixture of language experts for multi-lingual automatic speech recognition

MH-MoE: Multi-Head Mixture-of-Experts

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

Inference and Denoise: Causal Inference-Based Neural Speech Enhancement