[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention--w/o Data Augmentation

C Lüscher, E Beck, K Irie, M Kitza, W Michel… - arxiv preprint arxiv …, 2019 - arxiv.org
We present state-of-the-art automatic speech recognition (ASR) systems employing a
standard hybrid DNN/HMM architecture compared to an attention-based encoder-decoder …

Are multidimensional recurrent layers really necessary for handwritten text recognition?

J Puigcerver - 2017 14th IAPR international conference on …, 2017 - ieeexplore.ieee.org
Current state-of-the-art approaches to offline Handwritten Text Recognition extensively rely
on Multidimensional Long Short-Term Memory networks. However, these architectures …

Improved training of end-to-end attention models for speech recognition

A Zeyer, K Irie, R Schlüter, H Ney - arxiv preprint arxiv:1805.03294, 2018 - arxiv.org
Sequence-to-sequence attention-based models on subword units allow simple open-
vocabulary end-to-end speech recognition. In this work, we show that such models can …

Handwriting recognition with large multidimensional long short-term memory recurrent neural networks

P Voigtlaender, P Doetsch… - 2016 15th international …, 2016 - ieeexplore.ieee.org
Multidimensional long short-term memory recurrent neural networks achieve impressive
results for handwriting recognition. However, with current CPU-based implementations, their …

A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition

A Zeyer, P Doetsch, P Voigtlaender… - … on acoustics, speech …, 2017 - ieeexplore.ieee.org
Recent experiments show that deep bidirectional long short-term memory (BLSTM) recurrent
neural network acoustic models outperform feedforward neural networks for automatic …

Generating synthetic audio data for attention-based speech recognition systems

N Rossenbach, A Zeyer, R Schlüter… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Recent advances in text-to-speech (TTS) led to the development of flexible multi-speaker
end-to-end TTS systems. We extend state-of-the-art attention-based automatic speech …

Attention based on-device streaming speech recognition with large speech corpus

K Kim, K Lee, D Gowda, J Park, S Kim… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
In this paper, we present a new on-device automatic speech recognition (ASR) system
based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) …

On using specaugment for end-to-end speech translation

P Bahar, A Zeyer, R Schlüter, H Ney - arxiv preprint arxiv:1911.08876, 2019 - arxiv.org
This work investigates a simple data augmentation technique, SpecAugment, for end-to-end
speech translation. SpecAugment is a low-cost implementation method applied directly to …