A review of sparse expert models in deep learning

W Fedus, J Dean, B Zoph - arxiv preprint arxiv:2209.01667, 2022 - arxiv.org
Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

On the representation collapse of sparse mixture of experts

Z Chi, L Dong, S Huang, D Dai, S Ma… - Advances in …, 2022 - proceedings.neurips.cc
Sparse mixture of experts provides larger model capacity while requiring a constant
computational overhead. It employs the routing mechanism to distribute input tokens to the …

Multi-head mixture-of-experts

X Wu, S Huang, W Wang, F Wei - arxiv preprint arxiv:2404.15045, 2024 - arxiv.org
Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in
training and inference costs, but exhibits the following two issues:(1) Low expert activation …

Language-routing mixture of experts for multilingual and code-switching speech recognition

W Wang, G Ma, Y Li, B Du - arxiv preprint arxiv:2307.05956, 2023 - arxiv.org
Multilingual speech recognition for both monolingual and code-switching speech is a
challenging task. Recently, based on the Mixture of Experts (MoE), many works have made …

Moec: Mixture of expert clusters

Y **e, S Huang, T Chen, F Wei - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Abstract Sparsely Mixture of Experts (MoE) has received great interest due to its promising
scaling capability with affordable computational overhead. MoE models convert dense …

Mole: Mixture of language experts for multi-lingual automatic speech recognition

Y Kwon, SW Chung - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Multi-lingual speech recognition aims to distinguish linguistic expressions in different
languages and integrate acoustic processing simultaneously. In contrast, current …

MH-MoE: Multi-Head Mixture-of-Experts

S Huang, X Wu, S Ma, F Wei - arxiv preprint arxiv:2411.16205, 2024 - arxiv.org
Multi-Head Mixture-of-Experts (MH-MoE) demonstrates superior performance by using the
multi-head mechanism to collectively attend to information from various representation …

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

RSY Teo, TM Nguyen - arxiv preprint arxiv:2410.14574, 2024 - arxiv.org
Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability
in deep learning. SMoE has the potential to exponentially increase parameter count while …

Inference and Denoise: Causal Inference-Based Neural Speech Enhancement

TA Hsieh, CHH Yang, PY Chen… - 2023 IEEE 33rd …, 2023 - ieeexplore.ieee.org
This study addresses the speech enhancement (SE) task within the causal inference
paradigm by modeling the noise presence as an intervention. Based on the potential …