[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
An overview of end-to-end automatic speech recognition
D Wang, X Wang, S Lv - Symmetry, 2019 - mdpi.com
Automatic speech recognition, especially large vocabulary continuous speech recognition,
is an important issue in the field of machine learning. For a long time, the hidden Markov …
is an important issue in the field of machine learning. For a long time, the hidden Markov …
End-to-end speech recognition with word-based RNN language models
This paper investigates the impact of word-based RNN language models (RNN-LMs) on the
performance of end-to-end automatic speech recognition (ASR). In our prior work, we have …
performance of end-to-end automatic speech recognition (ASR). In our prior work, we have …
Advancing acoustic-to-word CTC model
The acoustic-to-word model based on the connectionist temporal classification (CTC)
criterion was shown as a natural end-to-end (E2E) model directly targeting words as output …
criterion was shown as a natural end-to-end (E2E) model directly targeting words as output …
Towards code-switching ASR for end-to-end CTC models
Although great progress has been made on end-to-end (E2E) models for monolingual and
multilingual automatic speech recognition (ASR), there is no successful study for E2E …
multilingual automatic speech recognition (ASR), there is no successful study for E2E …
Transformer ASR with contextual block processing
E Tsunoo, Y Kashiwagi, T Kumakura… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
The Transformer self-attention network has recently shown promising performance as an
alternative to recurrent neural networks (RNNs) in end-to-end (E2E) automatic speech …
alternative to recurrent neural networks (RNNs) in end-to-end (E2E) automatic speech …
The speechtransformer for large-scale mandarin chinese speech recognition
J Li, X Wang, Y Li - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
Attention-based sequence-to-sequence architectures have made great progress in the
speech recognition task. The SpeechTransformer, a no-recurrence encoder-decoder …
speech recognition task. The SpeechTransformer, a no-recurrence encoder-decoder …
Augmented generalized deep learning with special vocabulary
J Ward, A Sypniewski, S Stephenson - US Patent 10,210,860, 2019 - Google Patents
Systems and methods are disclosed for customizing a neural network for a custom dataset,
when the neural network has been trained on data from a general dataset. The neural …
when the neural network has been trained on data from a general dataset. The neural …
Pushing the boundaries of audiovisual word recognition using residual networks and LSTMs
Visual and audiovisual speech recognition are witnessing a renaissance which is largely
due to the advent of deep learning methods. In this paper, we present a deep learning …
due to the advent of deep learning methods. In this paper, we present a deep learning …
Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition
Encoder-decoder models for acoustic-to-word (A2W) automatic speech recognition (ASR)
are attractive for their simplicity of architecture and run-time latency while achieving state-of …
are attractive for their simplicity of architecture and run-time latency while achieving state-of …