An overview of end-to-end automatic speech recognition

D Wang, X Wang, S Lv - Symmetry, 2019 - mdpi.com
Automatic speech recognition, especially large vocabulary continuous speech recognition,
is an important issue in the field of machine learning. For a long time, the hidden Markov …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

End-to-end attention-based large vocabulary speech recognition

D Bahdanau, J Chorowski, D Serdyuk… - … on acoustics, speech …, 2016 - ieeexplore.ieee.org
Many state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) Systems
are hybrids of neural networks and Hidden Markov Models (HMMs). Recently, more direct …

An actor-critic algorithm for sequence prediction

D Bahdanau, P Brakel, K Xu, A Goyal, R Lowe… - arxiv preprint arxiv …, 2016 - arxiv.org
We present an approach to training neural networks to generate sequences using actor-
critic methods from reinforcement learning (RL). Current log-likelihood training methods are …

cudnn: Efficient primitives for deep learning

S Chetlur, C Woolley, P Vandermersch… - arxiv preprint arxiv …, 2014 - arxiv.org
We present a library of efficient implementations of deep learning primitives. Deep learning
workloads are computationally intensive, and optimizing their kernels is difficult and time …

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding

Y Miao, M Gowayyed, F Metze - 2015 IEEE workshop on …, 2015 - ieeexplore.ieee.org
The performance of automatic speech recognition (ASR) has improved tremendously due to
the application of deep neural networks (DNNs). Despite this progress, building a new ASR …

TED-LIUM 3: Twice as much data and corpus repartition for experiments on speaker adaptation

F Hernandez, V Nguyen, S Ghannay… - Speech and Computer …, 2018 - Springer
In this paper, we present TED-LIUM release 3 corpus (TED-LIUM 3 is available on
https://lium. univ-lemans. fr/ted-lium3/) dedicated to speech recognition in English, which …

Streaming automatic speech recognition with the transformer model

N Moritz, T Hori, J Le - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art
results in end-to-end automatic speech recognition (ASR). Recently, the transformer …

Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition

Z Niu, B Mak - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
In this paper, we propose novel stochastic modeling of various components of a continuous
sign language recognition (CSLR) system that is based on the transformer encoder and …