Speechmoe: Scaling to large acoustic models with dynamic routing mixture of experts

Z You, S Feng, D Su, D Yu - arxiv preprint arxiv:2105.03036, 2021 - arxiv.org
Recently, Mixture of Experts (MoE) based Transformer has shown promising results in many
domains. This is largely due to the following advantages of this architecture: firstly, MoE …

[PDF][PDF] Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition.

Y Zhang, P Zhang, Y Yan - Interspeech, 2017 - isca-archive.org
Distant speech recognition is a highly challenging task due to background noise,
reverberation, and speech overlap. Recently, there has been an increasing focus on …

A study of enhancement, augmentation, and autoencoder methods for domain adaptation in distant speech recognition

H Tang, WN Hsu, F Grondin, J Glass - arxiv preprint arxiv:1806.04841, 2018 - arxiv.org
Speech recognizers trained on close-talking speech do not generalize to distant speech and
the word error rate degradation can be as large as 40% absolute. Most studies focus on …

An investigation into using parallel data for far-field speech recognition

Y Qian, T Tan, D Yu - 2016 IEEE International Conference on …, 2016 - ieeexplore.ieee.org
Far-field speech recognition is an important yet challenging task due to low signal to noise
ratio. In this paper, three novel deep neural network architectures are explored to improve …

Recurrent models for auditory attention in multi-microphone distance speech recognition

S Kim, I Lane - arxiv preprint arxiv:1511.06407, 2015 - arxiv.org
Integration of multiple microphone data is one of the key ways to achieve robust speech
recognition in noisy environments or when the speaker is located at some distance from the …

Dfsmn-san with persistent memory model for automatic speech recognition

Z You, D Su, J Chen, C Weng… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Self-attention networks (SAN) have been introduced into automatic speech recognition
(ASR) and achieved state-of-the-art performance owing to its superior ability in capturing …

Neural network based multi-factor aware joint training for robust speech recognition

Y Qian, T Tan, D Yu - IEEE/ACM Transactions on Audio …, 2016 - ieeexplore.ieee.org
Although great progress has been made in automatic speech recognition (ASR), significant
performance degradation still exists in noisy environments. In this paper, a novel factor …

[PDF][PDF] Iterative Learning of Speech Recognition Models for Air Traffic Control.

A Srinivasamurthy, P Motlicek, M Singh, Y Oualil… - …, 2018 - publications.idiap.ch
Abstract Automatic Speech Recognition (ASR) has recently proved to be a useful tool to
reduce the workload of air traffic controllers leading to significant gains in operational …

Integrated adaptation with multi-factor joint-learning for far-field speech recognition

Y Qian, T Tan, D Yu, Y Zhang - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Although great progress has been made in automatic speech recognition (ASR), significant
performance degradation still exists in distant talking scenarios due to significantly lower …

[PDF][PDF] Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features.

J Guo, G Yeung, D Muralidharan, H Arsikere… - …, 2016 - academia.edu
Speaker verification in real-world applications sometimes deals with limited duration of
enrollment and/or test data. MFCC-based i-vector systems have defined the state-of-the-art …