Speechmoe: Scaling to large acoustic models with dynamic routing mixture of experts
Recently, Mixture of Experts (MoE) based Transformer has shown promising results in many
domains. This is largely due to the following advantages of this architecture: firstly, MoE …
domains. This is largely due to the following advantages of this architecture: firstly, MoE …
[PDF][PDF] Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition.
Distant speech recognition is a highly challenging task due to background noise,
reverberation, and speech overlap. Recently, there has been an increasing focus on …
reverberation, and speech overlap. Recently, there has been an increasing focus on …
A study of enhancement, augmentation, and autoencoder methods for domain adaptation in distant speech recognition
Speech recognizers trained on close-talking speech do not generalize to distant speech and
the word error rate degradation can be as large as 40% absolute. Most studies focus on …
the word error rate degradation can be as large as 40% absolute. Most studies focus on …
An investigation into using parallel data for far-field speech recognition
Far-field speech recognition is an important yet challenging task due to low signal to noise
ratio. In this paper, three novel deep neural network architectures are explored to improve …
ratio. In this paper, three novel deep neural network architectures are explored to improve …
Recurrent models for auditory attention in multi-microphone distance speech recognition
Integration of multiple microphone data is one of the key ways to achieve robust speech
recognition in noisy environments or when the speaker is located at some distance from the …
recognition in noisy environments or when the speaker is located at some distance from the …
Dfsmn-san with persistent memory model for automatic speech recognition
Self-attention networks (SAN) have been introduced into automatic speech recognition
(ASR) and achieved state-of-the-art performance owing to its superior ability in capturing …
(ASR) and achieved state-of-the-art performance owing to its superior ability in capturing …
Neural network based multi-factor aware joint training for robust speech recognition
Although great progress has been made in automatic speech recognition (ASR), significant
performance degradation still exists in noisy environments. In this paper, a novel factor …
performance degradation still exists in noisy environments. In this paper, a novel factor …
[PDF][PDF] Iterative Learning of Speech Recognition Models for Air Traffic Control.
Abstract Automatic Speech Recognition (ASR) has recently proved to be a useful tool to
reduce the workload of air traffic controllers leading to significant gains in operational …
reduce the workload of air traffic controllers leading to significant gains in operational …
Integrated adaptation with multi-factor joint-learning for far-field speech recognition
Although great progress has been made in automatic speech recognition (ASR), significant
performance degradation still exists in distant talking scenarios due to significantly lower …
performance degradation still exists in distant talking scenarios due to significantly lower …
[PDF][PDF] Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features.
Speaker verification in real-world applications sometimes deals with limited duration of
enrollment and/or test data. MFCC-based i-vector systems have defined the state-of-the-art …
enrollment and/or test data. MFCC-based i-vector systems have defined the state-of-the-art …