A review of sparse expert models in deep learning
Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …
Modular deep learning
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …
trained models fine-tuned for downstream tasks achieve better performance with fewer …
On the representation collapse of sparse mixture of experts
Sparse mixture of experts provides larger model capacity while requiring a constant
computational overhead. It employs the routing mechanism to distribute input tokens to the …
computational overhead. It employs the routing mechanism to distribute input tokens to the …
Multi-head mixture-of-experts
Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in
training and inference costs, but exhibits the following two issues:(1) Low expert activation …
training and inference costs, but exhibits the following two issues:(1) Low expert activation …
Language-routing mixture of experts for multilingual and code-switching speech recognition
W Wang, G Ma, Y Li, B Du - arxiv preprint arxiv:2307.05956, 2023 - arxiv.org
Multilingual speech recognition for both monolingual and code-switching speech is a
challenging task. Recently, based on the Mixture of Experts (MoE), many works have made …
challenging task. Recently, based on the Mixture of Experts (MoE), many works have made …
Moec: Mixture of expert clusters
Abstract Sparsely Mixture of Experts (MoE) has received great interest due to its promising
scaling capability with affordable computational overhead. MoE models convert dense …
scaling capability with affordable computational overhead. MoE models convert dense …
Mole: Mixture of language experts for multi-lingual automatic speech recognition
Multi-lingual speech recognition aims to distinguish linguistic expressions in different
languages and integrate acoustic processing simultaneously. In contrast, current …
languages and integrate acoustic processing simultaneously. In contrast, current …
MH-MoE: Multi-Head Mixture-of-Experts
Multi-Head Mixture-of-Experts (MH-MoE) demonstrates superior performance by using the
multi-head mechanism to collectively attend to information from various representation …
multi-head mechanism to collectively attend to information from various representation …
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability
in deep learning. SMoE has the potential to exponentially increase parameter count while …
in deep learning. SMoE has the potential to exponentially increase parameter count while …
Inference and Denoise: Causal Inference-Based Neural Speech Enhancement
This study addresses the speech enhancement (SE) task within the causal inference
paradigm by modeling the noise presence as an intervention. Based on the potential …
paradigm by modeling the noise presence as an intervention. Based on the potential …