[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

An overview of end-to-end automatic speech recognition

D Wang, X Wang, S Lv - Symmetry, 2019 - mdpi.com
Automatic speech recognition, especially large vocabulary continuous speech recognition,
is an important issue in the field of machine learning. For a long time, the hidden Markov …

End-to-end speech recognition with word-based RNN language models

T Hori, J Cho, S Watanabe - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org
This paper investigates the impact of word-based RNN language models (RNN-LMs) on the
performance of end-to-end automatic speech recognition (ASR). In our prior work, we have …

Advancing acoustic-to-word CTC model

J Li, G Ye, A Das, R Zhao… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
The acoustic-to-word model based on the connectionist temporal classification (CTC)
criterion was shown as a natural end-to-end (E2E) model directly targeting words as output …

Towards code-switching ASR for end-to-end CTC models

K Li, J Li, G Ye, R Zhao, Y Gong - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Although great progress has been made on end-to-end (E2E) models for monolingual and
multilingual automatic speech recognition (ASR), there is no successful study for E2E …

Transformer ASR with contextual block processing

E Tsunoo, Y Kashiwagi, T Kumakura… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
The Transformer self-attention network has recently shown promising performance as an
alternative to recurrent neural networks (RNNs) in end-to-end (E2E) automatic speech …

The speechtransformer for large-scale mandarin chinese speech recognition

J Li, X Wang, Y Li - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
Attention-based sequence-to-sequence architectures have made great progress in the
speech recognition task. The SpeechTransformer, a no-recurrence encoder-decoder …

Augmented generalized deep learning with special vocabulary

J Ward, A Sypniewski, S Stephenson - US Patent 10,210,860, 2019 - Google Patents
Systems and methods are disclosed for customizing a neural network for a custom dataset,
when the neural network has been trained on data from a general dataset. The neural …

Pushing the boundaries of audiovisual word recognition using residual networks and LSTMs

T Stafylakis, MH Khan, G Tzimiropoulos - Computer Vision and Image …, 2018 - Elsevier
Visual and audiovisual speech recognition are witnessing a renaissance which is largely
due to the advent of deep learning methods. In this paper, we present a deep learning …

Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition

M Mimura, S Ueno, H Inaguma, S Sakai… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Encoder-decoder models for acoustic-to-word (A2W) automatic speech recognition (ASR)
are attractive for their simplicity of architecture and run-time latency while achieving state-of …